IBM Strengthens Effort to Support Open Source Spark for Machine Learning

Spark 300x251 IBM Strengthens Effort to Support Open Source Spark for Machine Learning

IBM is providing substantial resources to the Apache Software Foundation’s Spark project to prepare the platform for machine learning tasks, like pattern recognition and classification of objects. The company plans to offer Bluemix Spark as a service and has dedicated 3,500 researchers and developers to assist in its preservation and further development.

In 2009, AMPLab of the University of Berkeley developed the Spark framework that went open source a year later as an Apache project. This framework, which runs on a server cluster, can process data up to 100 times faster than Hadoop MapReduce. Given that the data and analyzes are embedded in the corporate structure and society – from applications to the Internet of Things (IoT) – Spark provides essential advancements in large-scale data processing.

First, it significantly improves the performance of applications dependent data. Then it radically simplifies the development process of intelligence, which are supplied by the data. Specifically, in its effort to accelerate innovation on Spark ecosystem, IBM decided to include Spark in its own platforms of predictive analysis and machine learning.

IBM Watson Health Cloud will use Spark to healthcare providers and researchers as they have access to new health data of the population. At the same time, IBM will make available its SystemML machine learning technology open source. IBM is also collaborating with Databricks in changing Spark capabilities.

IBM will hire more than 3,500 researchers and developers to work on Spark-related projects in more than a dozen laboratories worldwide. The big blue company plans to open a Spark Technology Center in San Francisco for the Data Science and the developer community. IBM will also train Spark to more than one million data scientists and data engineers through partnerships with DataCamp, AMPLab, Galvanize, MetiStream, and Big Data University.

A typical large corporation will have hundreds or thousands of data sets that reside in different databases through their computer system. A data scientist can design an algorithm using to plumb the depths of any database. But is needs 90 working days of scientific data to develop the algorithm. Today, if you want to implement another system, it is a quarter of work to adjust the algorithm so that it works. Spark eliminates that time in half. The spark-based system can access and analyze any database, without development and no additional delay.

Spark has another virtue of ease of use where developers can concentrate on the design of the solution, rather than building an engine from scratch. Spark brings advances in data processing technology on a large scale because it improves the performance of data-dependent applications, radically simplifies the process of developing intelligent solutions and enables a platform capable of unifying all kinds of information on real work schemes.

Many experts consider Spark as the successor to Hadoop, but its adoption remains slow. Spark works very well for machine learning tasks that normally require running large clusters of computers. The latest version of the platform, which recently came out, extends to the machine learning algorithms to run.

Amazon EMR Update – Apache Spark 1.5.2, Ganglia, Presto, Zeppelin, and Oozie

  • Today we are announcing Amazon EMR release 4.2.0, which adds support for Apache Spark 1.5.2, Ganglia 3.6 for Apache Hadoop and Spark monitoring, and new sandbox releases for Presto (0.125), Apache Zeppelin (0.5.5), and Apache Oozie (4.2.0).

New Applications in Release 4.2.0
Amazon EMR provides an easy way to install and configure distributed big data applications in the Hadoop and Spark ecosystems on managed clusters of Amazon EC2 instances. You can create Amazon EMR clusters from the Amazon EMR Create Cluster Page in the AWS Management Console, AWS Command Line Interface (CLI), or using a SDK with EMR API. In the latest release, we added support for several new versions of applications:

  • Spark 1.5.2 – Spark 1.5.2 was released on November 9th, and we’re happy to give you access to it within two weeks of general availability. This version is a maintenance release, with improvements to Spark SQL, SparkR, the DataFrame API, and miscellaneous enhancements and bug fixes. Also, Spark documentation now includes information on enabling wire encryption for the block transfer service. For a complete set of changes, view the JIRA. To learn more about Spark on Amazon EMR, click here.
  • Ganglia 3.6 – Ganglia is a scalable, distributed monitoring system which can be installed on your Amazon EMR cluster to display Amazon EC2 instance level metrics which are also aggregated at the cluster level. We also configure Ganglia to ingest and display Hadoop and Spark metrics along with general resource utilization information from instances in your cluster, and metrics are displayed in a variety of time spans. You can view these metrics using the Ganglia web-UI on the master node of your Amazon EMR cluster. To learn more about Ganglia on Amazon EMR, click here.
  • Presto 0.125 – Presto is an open-source, distributed SQL query engine designed for low-latency queries on large datasets in Amazon S3 and the Hadoop Distributed Filesystem (HDFS). Presto 0.125 is a maintenance release, with optimizations to SQL operations, performance enhancements, and general bug fixes. To learn more about Presto on Amazon EMR, click here.
  • Zeppelin 0.5.5 – Zeppelin is an open-source interactive and collaborative notebook for data exploration using Spark. You can use Scala, Python, SQL, or HiveQL to manipulate data and visualize results. Zeppelin 0.5.5 is a maintenance release, and contains miscellaneous improvements and bug fixes. To learn more about Zeppelin on Amazon EMR, click here.
  • Oozie 4.2.0 – Oozie is a workflow designer and scheduler for Hadoop and Spark. This version now includes Spark and HiveServer2 actions, making it easier to incorporate Spark and Hive jobs in Oozie workflows. Also, you can create and manage your Oozie workflows using the Oozie Editor and Dashboard in Hue, an application which offers a web-UI for Hive, Pig, and Oozie. Please note that in Hue 3.7.1, you must still use Shell actions to run Spark jobs. To learn more about Oozie in Amazon EMR, click here.

Launch an Amazon EMR Cluster with Release 4.2.0 Today
To create an Amazon EMR cluster with 4.2.0, select release 4.2.0 on the Create Cluster page in the AWS Management Console, or use the release label emr-4.2.0 when creating your cluster from the AWS CLI or using a SDK with the EMR API.

Jon Fritz, Senior Product Manager

  • Now Available: Version 1.0 of the AWS SDK for Go

    by Jeff Barr | on | in Developers | Permalink
    Earlier this year, my colleague Peter Moon shared our plans to launch an AWS SDK for Go. As you will read in Peter’s guest post below, the SDK is now generally available!— Jeff;

    At AWS, we work hard to promote and serve the developer community around our products. This is one of the reasons we open-source many of our libraries and tools on GitHub, where we cherish the ability to directly communicate and collaborate with our developer customers. Of all the experiences we’ve had in the open source community, the story of how the AWS SDK for Go came about is one we particularly love to share.

    Since the day we took ownership of the project 10 months ago, community feedback and contributions have made it possible for us progress through the experimental and preview stages, and today we are excited to announce that the AWS SDK for Go is now at version 1.0 and recommended for production use. Like many of our projects, the SDK follows Semantic Versioning, which means starting from 1.0, you can upgrade the SDK within the same major version 1.x and have confidence your existing code will continue to work.

    Since the Developer Preview announcement in June, we have added a number of key improvements to the SDK, including:

    • Sessions – Easily share configuration and request handlers between clients.
    • JMESPATH support – Query and reshape complex API responses and other structures using simple expressions.
    • Paginators – Iterate over multiple pages of list-type API responses.
    • Waiters – Wait for asynchronous state changes in AWS resources.
    • Documentation – Revamped developer guide.

    Here’s a code sample that exercises some of these new features:

    // Create a session
    s := session.New(aws.NewConfig().WithRegion("us-west-2"))
    // Add a handler to print every API request for the session
    s.Handlers.Send.PushFront(func(r *request.Request) {
    	fmt.Printf("Request: %s/%s\n", r.ClientInfo.ServiceName, r.Operation)
    // We want to start all instances in a VPC, so let's get their IDs first.
    ec2client := ec2.New(s)
    var instanceIDsToStart []*string
    describeInstancesInput := &ec2.DescribeInstancesInput{
    	Filters: []*ec2.Filter{
    			Name:   aws.String("vpc-id"),
    			Values: aws.StringSlice([]string{"vpc-82977de9"}),
    // Use a paginator to easily iterate over multiple pages of response
    	func(page *ec2.DescribeInstancesOutput, lastPage bool) bool {
    		// Use JMESPath expressions to query complex structures
    		ids, _ := awsutil.ValuesAtPath(page, "Reservations[].Instances[].InstanceId")
    		for _, id := range ids {
    			instanceIDsToStart = append(instanceIDsToStart, id.(*string))
    		return !lastPage
    // The SDK provides several utility functions for literal <--> pointer transformation
    fmt.Println("Starting:", aws.StringValueSlice(instanceIDsToStart))
    // Skipped for brevity here, but *always* handle errors in the real world 🙂
    	InstanceIds: instanceIDsToStart,
    // Finally, use a waiter function to wait until the instances are running
    fmt.Println("Instances are now running.") 

    We would like to again thank Coda Hale and our friends at Stripe for contributing the original code base and giving us a wonderful starting point for the AWS SDK for Go. Now that it is fully production-ready, we can’t wait to see all the innovative applications our customers will build with the SDK!

    For more information please see:

    Peter Moon, Senior Product Manager

  • AWS Device Farm Update – Test Web Apps on Mobile Devices

    by Jeff Barr | on | in AWS Device Farm | Permalink | Comments
    If you build mobile apps, you know that you have two implementation choices. You can build native or hybrid applications that compile to an executable file. You can also build applications that run within the device’s web browser.We launched the AWS Device Farm in July with support for testing native and hybrid applications on iOS and Android devices (see my post, AWS Device Farm – Test Mobile Apps on Real Devices, to learn more).

    Today we are adding support for testing browser-based applications on iOS and Android devices. Many customers have asked for this option and we are happy to be able to announce it. You can now create a single test run that spans any desired combination of supported devices and makes use of the Appium Java JUnit or Appium Java TestNG frameworks (we’ll add additional frameworks over time; please let us know what you need).

    Testing a Web App
    I tested a simple web app. It opens and searches for the string “Kindle”. I opened the Device Farm Console and created a new project (Test Amazon Site). Then I created a new run (this was my second test, so I called it Web App Test #2):

    Then I configured the test by choosing the test type (TestNG) and uploading the tests (prepared for me by one of my colleagues):

    The file ( contains the compiled test and the dependencies (a bunch of JAR files):

    Next, I choose the devices. I had already created a “pool” of Android devices, so I used it:

    I started the run and then checked in on it a few minutes later:

    Then I inspected the output, including screen shots, from a single test:

    Available Now
    This new functionality is available now and you can start using it today! Read the Device Farm Documentation to learn more.

Apache Spark 1.5.2 and new versions of Ganglia monitoring, Presto, Zeppelin, and Oozie now available in Amazon EMR

You can now deploy new applications on your Amazon EMR cluster. Amazon EMR release 4.2.0 now offers Ganglia 3.6, an upgraded version of Apache Spark (1.5.2), and upgraded sandbox releases of Apache Oozie (4.2.0), Presto (0.125), and Apache Zeppelin (0.5.5). Ganglia provides resource utilization monitoring for Hadoop and Spark. Oozie 4.2.0 includes several new features, such as adding Spark actions and HiveServer2 actions in your Oozie workflows. Spark 1.5.2, Presto 0.125, and Zeppelin 0.5.5 are maintenance releases, and contain bug fixes and other optimizations.

You can create an Amazon EMR cluster with release 4.2.0 by choosing release label “emr-4.2.0” from the AWS Management Console, AWS CLI, or SDK. You can specify Ganglia, Spark, Oozie-Sandbox, Presto-Sandbox, and Zeppelin-Sandbox to install these applications on your cluster. To view metrics in Ganglia or create a Zeppelin notebook, you can connect to the web-based UIs for these applications on the master node of your cluster. Please visit the Amazon EMR documentation for more information about Ganglia 3.6, Spark 1.5.2, Oozie 4.2.0, Presto 0.125, and Zeppelin 0.5.5

3 key ways Hadoop is evolving

Hot themes at the Strata+Hadoop World conference reflect the shift for the big data platform

The Strata+Hadoop World 2015 conference in New York this week was subtitled “Make Data Work,” but given how Hadoop world’s has evolved over the past year (even over the past six months) another apt subtitle might have been “See Hadoop Change.”
bigdata hadoop winter Training at linuxworldThis guide, available in both PDF and ePub editions, explains the security capabilities inherent to
Read Now

Here are three of the most significant recent trends in Hadoop, as reflected by the show’s roster of breakout sessions, vendors, and technologies.

Spark is so hot it had its own schedule track, labeled “Spark and Beyond,” with sessions on everything from using the R language with Spark to running Spark on Mesos.

Some of the enthusiasm comes from Cloudera — a big fan of Spark — and its sponsorship for the show. But Spark’s rising popularity is hard to ignore.

Spark’s importance stems from how it offers self-service data processing, by way of a common API, no matter where that data is stored. (At least half of the work done with Spark isn’t within Hadoop.) Arsalan Tavakoli-Shiraji, vice president of customer engagement for Databricks, Spark’s chief commercial proponent, spoke of how those tasked with getting business value out of data “eagerly want data, whether they’re using SQL, R, or Python, but hate calling IT.”

Rob Thomas, IBM’s vice president of product development for IBM Analytics, cited Spark as a key in the shift away from “a world of infrastructure to a world of insight.” Hadoop data lakes often become dumping grounds, he claimed, without much business value that Spark can provide.

The pitch for Hadoop is no longer about it being a data repository — that’s a given — it’s about having skilled people and powerful tools to plug into it in order to get something useful out.

Two years ago, the keynote speeches at Strata+Hadoop were all about creating a single repository for enterprise data. This time around, the words “data lake” were barely mentioned in the keynotes — and only in a derogatory tone. Talk of “citizen data scientists,” “using big data for good,” and smart decision making with data was offered instead.

What happened to the old message? It was elbowed aside by the growing realization that the culture of self-service tools for data science on Hadoop offers more real value than the ability to aggregate data from multiple sources. If the old Hadoop world was about free-form data storage, the new Hadoop world is (ostensibly) about free-form data science.

The danger s making terms like “data scientist” too generic, in the same way that “machine learning” was watered down through overly broad use.

Hadoop is become a proving ground for new tech

Few would dispute that Hadoop remains important, least of all the big names behind the major distributions. But attention and excitement seem less focused on Hadoop as a whole than on the individual pieces emerging from Hadoop’s big tent — and are put to use creating entirely new products.

Spark is the obvious example, both for what it can do and how it goes about doing it. Spark’s latest incarnation features major workarounds for issues with the JVM’s garbage collection and memory management systems, technologies that have exciting implications outside of Spark.

But other new-tech-from-Hadoop examples are surfacing: Kafka, the Hadoop message-broker system for high-speed data streams, is at the heart of products like Mesosphere Infinity and Salesforce’s IoT Cloud. If a technology can survive deployment at scale within Hadoop, the conventional wisdom goes, it’s probably a good breakthrough.

Unfortunately, because Hadoop is such a fertile breeding ground, it’s also becoming more fragmented. Efforts to provide a firmer definition of what’s inside the Hadoop tent, like the Open Data Platform Initiative, have inspired as much dissent and division as agreement and consensus. And new additions to the Hadoop toolbox risk further complicating an already dense picture. Kudu, the new Hadoop file system championed by Cloudera as a way to combine the best of HDFS and HBase, isn’t compatible with HDFS’ protocols — yet.

There’s little sign that the mix of ingredients that make up Hadoop will become any less ad hoc or variegated with time, thanks to the slew of vendors vying to deliver their own spin on the platform. But whatever becomes of Hadoop, some of its pieces have already proven they can thrive on their own

How a Cloud Antivirus Works

How a Cloud Antivirus Works


panda cloud antivirus

Panda Cloud Antivirus scans your computer at regular intervals and checks it against the latest malware threats in its database.

Screenshot by Stephanie Crawford for HowStuffWorks

Whether you have years of computing behind you, or you’ve just bought your first laptop or desktop, you’re probably familiar with the need to protect computers from viruses. A virus is a software program that installs itself on your computer and makes undesirable changes to the data on your computer. Though there are rare viruses designed to target offline computers, we’re talking about malicious software (malware) you can pick up from the Internet.

To prevent malware from attacking your data, you can use antivirus software. One antivirus option is a technology called cloud antivirus. Cloud antivirus software does most of its processing elsewhere on the Internet rather than on your computer’s hard drive. Internet technology like cloud computing has made such innovations both possible and affordable.

Cloud antivirus software consists of client and Web service components working together. The client is a small program running on your local computer, which scans the system for malware. Full locally installed antivirus applications are notorious resource hogs, but cloud antivirus clients require only a small amount processing power.

The Web service behind cloud antivirus is software running on one or more servers somewhere on the Internet. The Web service handles most of the data processing so your computer doesn’t have to process and store massive amounts of virus information. At regular intervals, the client will scan your computer for any malware listed in the Web service’s database.

Here’s a summary of the advantages cloud antivirus has over traditional, locally installed antivirus software:

  • You have access to the latest data about malware within minutes of the cloud antivirus Web service learning about it. There’s no need to continually update your antivirus software to ensure you’re protected from the latest threats.
  • The cloud antivirus client is small, and it requires little processing power as you go on with your day-to-day activities online.
  • It’s free! You can get an impressive level of virus protection from the free versions of cloud antivirus software. You can also purchase upgrades for additional utilities and support, for prices that are competitive with popular local-only antivirus applications.

Now that you know what cloud antivirus is, let’s look at the features of cloud antivirus software and how you can use them to keep your system clean.

SAP’s HANA Vora Query Engine Harnesses Spark, Hadoop for Data Analysis

SAP says its new HANA Vora query engine extends the Apache Spark processing engine to provide the data analytics muscle to pull business insights from all types of big data.


SAP is introducing a new in-memory query engine called HANA Vora that leverages the Apache Spark open source data processing engine and Hadoop to mine business insights from vast stores of data produced by machines, business transactions and sensors.The name Vora, short for “voracious,” according to the company, reflects the product’s ability to apply big data analytics techniques to enormous quantities of data.”HANA Vora plugs into Apache Spark to bring business data awareness, performance and real-time analytics to the enormous volumes of data that industries of all types will generate just in the next five years,” said Quentin Clark, SAP’s chief technology officer, in a video introducing Vora.Clark cited estimates that global businesses will generate 44 trillion gigabytes of data by 2020. Vora will enable enterprises to merge this vast quantity of new data with existing enterprise data sets to “make meaning out of all that data.”

SAP says its goal with HANA Vora is to relieve much of the complexity and grunt work with Spark and Hadoop to produce meaningful business insights from distributed data sets.

The trick is to put big data analytics in context with an understanding of business processes to pull business insights from the data. That is what SAP says HANA Vora will achieve.Financial services, health care, manufacturing and telecommunications are just a few of the industries where big data analytics can produce significant improvements to business processes, according to SAP.For example, Vora can be used in the telecommunications industry to relieve network congestion by analyzing traffic patterns.  It can also be used to detect anomalies in large volumes of financial transactions that indicate the possibility of fraud.The company plans to release HANA Vora to customers in late September. Also available will be a cloud-based developer edition.SAP’s introduction of Vora is an “interesting strategic and practical move that could pay dividends over time,” said Charles King, principal analyst with Pund-IT.

“In essence, Vora is an in-memory query processor that can be used to speed queries of unstructured data in Hadoop/Apache Spark environments, as well as structured information in common enterprise data sources, including SAP HANA. That could be a very attractive proposition to SAP’s large enterprise customers.”The introduction of Vora is fairly timely because “Apache Spark is a very hot topic right now and other vendors, including IBM are making sizable investments” in Spark and Hadoop technology, King noted. SAP is bringing Vora to market at a time when adoption of Spark is still in its early stages and making it work with other SAP technology such as HANA, King noted.SAP also announced application development enhancements to the SAP HANA Cloud Platform that will enable enterprises to speed up the development of a variety of applicationsOne of the enhancements enables enterprises to develop applications that gather and analyze data collected from sensors and industrial control devices connected to the Internet of Things.

Services available on this platform include device data connectivity, device management and data synchronization features.SAP also announced new business services running on the HANA cloud platform. These include a new SAP global tax calculation service that is going into limited trial in September. It allows companies to calculate taxes from more than 75 countries around the world.The service supports many tax functions, including withholding taxes, value-added taxes and import/export taxes. The service also keeps pace with changes in tax laws that alter tax calculations.The company also announced a public beta test program for the SAP Hybris-as-a-Service on the HANA Cloud platform. Hybris is a cloud platform for building business services of virtually any kind. The Hybris- as-a-Service platform  is open to independent software vendors, enterprise IT organizations and systems providers to build their own cloud services and market them to customers or other application developers.

Article Source –

Microsoft Azure VMs Aimed at Bigger Enterprise Cloud Workloads

290x195msftlayoffs20142Microsoft is making more room on its cloud for big enterprise application workloads.In January, the company announced the general availability of high-performance G-Series virtual machines (VMs) for Azure that offered up to 32 virtual CPUs powered by cutting-edge Intel Xeon server processors, 6TB of storage capacity provided by solid-state drives (SSDs) and 448GB of memory. According to Microsoft, enterprise adoption is brisk, with a 50 percent increase in use over the past three months.Now, the Redmond, Wash.-based tech giant and cloud provider is aiming even higher.”Today, we’re excited to announce a variant of G-series, the GS-series, which combines the compute power of G-series with the performance of Premium Storage to create powerful VMs for your most storage- and compute-intensive applications,” wrote Corey Sanders, partner director of program management at Microsoft Azure, in a Sept. 2 announcement. Still powered by Intel Xeon E5 v3 processors, the new Azure VMs bring Premium Storage support into the mix.

GS-series VMs, which are compatible with both Windows and Linux, “can have up to 64TB of storage, provide 80,000 IOPS (storage I/Os per second) and deliver 2,000 [megabytes per second] of storage throughput,” Sanders said. Microsoft claims that compared to rivals, the new VMs offer more than double the disk throughput and network bandwidth (20G bps).

The new offering is aimed at large database-driven workloads, Sanders noted. “Relational databases like SQL Server and mySQL, noSQL databases like MongoDB and data warehouses can all have significant performance gains when run on GS-series,” he said.Businesses seeking to grow or enhance the performance of their existing applications can use the VMs to trade up. “You can also use GS-series to significantly scale up the performance of enterprise applications, such as Exchange and Dynamics,” Sanders added.GS-series VMs are available in five sizes. The starter size (Standard_GS1) provides two virtual CPUs, 26GB of memory, a storage performance rating of 5,000 IOPS and a maximum disk bandwidth of 125MB per second. The top-tier Standard_GS5 supports up to 32 virtual CPUs and 448GB of memory, providing the performance Sanders used to illustrate the technology’s cloud-processing potential.For businesses that don’t require quite as much cloud computing horsepower, Microsoft also announced looming price cuts for its D-Series and DS-Series VMs.”We’re continuously striving to make these more accessible at lower price points, and are pleased to announce today that we’re reducing the prices on D-series and DS-series instances by as much as 27 percent,” Sanders said. The new pricing goes into effect on Oct. 1.Azure VM customers are also getting a new diagnostic tool to aid those suffering from boot or runtime failures. The tool displays the serial and console output of running VMs.

Google Raises the Cloud Apps Bar With Powerful New Docs Features

NEWS ANALYSIS: In its never-ending battle for dominance in the cloud applications market, Google has brought some powerful and mobile-friendly new features to Google Docs.

Imagine a voice dictation system that actually works for more than short messages. Imagine that it could eventranscribe a long conversation or a brainstorming session.That’s been a sort of Holy Grail for voice dictation for a long time. But now it appears that Google has managed to pull it off.However it turns out that voice dictation is just one of the major new features to come out of the new release of Google Docs announced on Sept. 2. Google also announced a new Research function and a new Explore function.Research is designed to integrate the process of search with the ability to cut and paste, so that you can, for example, add details from an online encyclopedia to a paper you’re writing on a tablet—along with photos—and do it all quickly and easily with a minimum of touch actions.

Explore is designed to make sense of data that you’ve stored in the Google Sheets spreadsheet app and to display it in a way that makes sense. Of course, that data still has to get into the spreadsheet somehow and you still have to tell Explore what data you want to look at, but according to the details released by Google today, the rest is automatic

Making document creation and collaboration easier for mobile users is a major focus of the new version of Google Docs, but not everything works with every mobile device. Research and Explore work only on Android, while voice dictation and typing work only on Android and iOS mobile devices. All of the features will work on PCs running Windows and on Macs as long as you use the Chrome browser.Most of the changes to Google Docs are evolutionary rather than revolutionary. For example, there’s a new feature for handling changes in collaborative documents where you can see the newest changes rather than everything that’s been changed since the last time that changes were accepted.Google also has incorporated a vast array of new templates with some new themes and more flexibility. These should make the basic documents, slides and spreadsheets more personalized for your organization. There are also some big changes in collaboration.One of the biggest changes to Google Docs is a “Share to Classroom” extension for Google Chrome. This feature is aimed at the education market, but would prove just as useful in any situation where a group is being asked to all look at the same Web page at the same time.Google uses an example of a fourth grade teacher in a blog entry on the sharing feature, but it would work equally well in a corporate training environment.

Intel seeks to boost its OpenStack cred with Mirantis move

Intel's financial backing for Mirantis will aid OpenStack -- and help Intel find future customers for its hardware-based virtualization tech

OpenStack vendor Mirantis has snagged another round of nine-digit funding, less than a year after the last influx of cash. The biggest and most familiar name in the current roster of financial supporters: Intel, itself an OpenStack fou nding member and one of Mirantis’ original funding suppliers.OPenstack 2

What brings Intel to Mirantis, specifically? Given Intel’s recent work, it’s not only about OpenStack. It’s also labout supporting the chipmaker’s current and future improvements in hardware-level virtualization techniques.
Birds eye view of male executive with coffee and wireless mobile phone
Subscribe to InfoWorld’s Mobile Tech Report

With precious few exceptions, enterprise data centers run on Intel iron, and Intel likes to make a continuing case for how its hardware — when coupled with the right software — can provide advantages not found elsewhere.

Case in point: the Clear Linux Project for Intel Architecture, originally announced as a container-oriented OS designed to take advantage of processor extensions found in more recent Intel hardware. CoreOS took a shine to the idea and has since added support for the features to its container runtime.

Thus, the reasons for Intel to further fund Mirantis would be twofold. One, Intel could provide contributions to the project to more directly leverage its virtualization enhancements. Two, it could leverage feedback from a hopefully growing base of Mirantis customers to figure out what hardware-level virtualization features are worth baking into the next generation of CPUs.

Part of Intel’s pitch for getting more deeply involved with OpenStack has been its Cloud for All initiative, for “accelerat[ing] enterprise-ready, easy-to-deploy software defined infrastructure (SDI) solutions.” Intel’s original partner with Cloud for All was Rackspace — another key OpenStack player — so this may simply be Intel partnering with the most visible presence in various major OpenStack categories: cloud vendors, distribution vendors, and so on.

However, Intel is repeating a canard that’s been heard many times before — that the best way to get OpenStack into more hands is to make it easier to work with and deploy. The problems with OpenStack clearly run deeper and include larger questions of how much enterprise clouds will mix public and private resources in the future.

Mirantis seems worthy of tapping for such strategy, as it’s survived the shakeouts that claimed earlier OpenStack startups (Nebula, for instance). But any blending of strengths between the two will need to do more than repeat history.

Even if the answers to those questions are straight out of the existing OpenStack playbook, Mirantis and Intel both stand to benefit — the former by an infusion of cash and expertise, the latter by having a conduit back to potential future customers for cutting-edge virtualization technology. But revolutionizing OpenStack as a whole seems out of the hands of any one, or even two, parties.

VMware Integrated OpenStack plays to VMware user base

One year ago VMware announced VMware Integrated OpenStack (VIO), designed to use OpenStack to bolster VMware — and vice versa. Now VIO 2.0 has arrived, with the same mission: It lets users with VMware expertise and product licensing leverage OpenStack without undue distraction or pain.

Ease of use and deployment are selling points for most commercial OpenStack products, but VMware continues to make a case that its product line — and customer openstack1

VIO 2.0 uses the OpenStack Kilo release, released earlier this year, as its foundation and adds support for Ceilometer (OpenStack’s telemetry component), autoscaling via the Heat orchestration system, load balancing as a service (LBaaS), and a host of other under-the-hood changes.

For OpenStack admins, the biggest change is VMware’s self-proclaimed seamless upgrade system. Upgrading an OpenStack installation has long been cited as a pain point, although some issues have been resolved incrementally over time (albeit more for individual components than for OpenStack as a whole).

VMware is determined not to let installation be a sticking point for its customers and promises the ability to roll back installs if anything goes awry, as well as back up and restore for OpenStack’s entire services and configuration set.
Keep your customers close(r)

With VIO, VMware never set out to capture an entirely new audience for OpenStack. Rather, VMware wants to protect itself from attrition since OpenStack has long been seen as a cheap way to get the majority of VMware’s functionality. If a chunk of VMware’s audience — present and future — eyes OpenStack as an escape hatch, it makes sense to give them an incentive to remain where they are or opt for VMware in the first round.

Boris Renski, co-founder of the OpenStack distribution Mirantis, has gone so far as to wager that VMware may draw more users for OpenStack than Red Hat. But he also seems aware that will most likely come through the existing VMware user base.

“Their sophisticated customer base wants to get more value out of their investments in VMware while also wanting the flexibility of working with alternative open source cloud solutions like OpenStack,” Renski told Matt Asay late last year.

VMware’s experiment with OpenStack as a value-add has plenty of room for growth. Some new features — load balancing, for instance — seem designed to appeal to users who want to build modern microservice-based architectures with OpenStack as a management framework, but also want to keep one foot in legacy VMs (which Charles Babcock of InformationWeek cited as a possibility).

But keeping VMware’s existing customer base happy, while letting them indulge in OpenStack without wandering off, remains the main mission.