Amazon EMR Update – Apache Spark 1.5.2, Ganglia, Presto, Zeppelin, and Oozie

  • Today we are announcing Amazon EMR release 4.2.0, which adds support for Apache Spark 1.5.2, Ganglia 3.6 for Apache Hadoop and Spark monitoring, and new sandbox releases for Presto (0.125), Apache Zeppelin (0.5.5), and Apache Oozie (4.2.0).

New Applications in Release 4.2.0
Amazon EMR provides an easy way to install and configure distributed big data applications in the Hadoop and Spark ecosystems on managed clusters of Amazon EC2 instances. You can create Amazon EMR clusters from the Amazon EMR Create Cluster Page in the AWS Management Console, AWS Command Line Interface (CLI), or using a SDK with EMR API. In the latest release, we added support for several new versions of applications:

  • Spark 1.5.2 – Spark 1.5.2 was released on November 9th, and we’re happy to give you access to it within two weeks of general availability. This version is a maintenance release, with improvements to Spark SQL, SparkR, the DataFrame API, and miscellaneous enhancements and bug fixes. Also, Spark documentation now includes information on enabling wire encryption for the block transfer service. For a complete set of changes, view the JIRA. To learn more about Spark on Amazon EMR, click here.
  • Ganglia 3.6 – Ganglia is a scalable, distributed monitoring system which can be installed on your Amazon EMR cluster to display Amazon EC2 instance level metrics which are also aggregated at the cluster level. We also configure Ganglia to ingest and display Hadoop and Spark metrics along with general resource utilization information from instances in your cluster, and metrics are displayed in a variety of time spans. You can view these metrics using the Ganglia web-UI on the master node of your Amazon EMR cluster. To learn more about Ganglia on Amazon EMR, click here.
  • Presto 0.125 – Presto is an open-source, distributed SQL query engine designed for low-latency queries on large datasets in Amazon S3 and the Hadoop Distributed Filesystem (HDFS). Presto 0.125 is a maintenance release, with optimizations to SQL operations, performance enhancements, and general bug fixes. To learn more about Presto on Amazon EMR, click here.
  • Zeppelin 0.5.5 – Zeppelin is an open-source interactive and collaborative notebook for data exploration using Spark. You can use Scala, Python, SQL, or HiveQL to manipulate data and visualize results. Zeppelin 0.5.5 is a maintenance release, and contains miscellaneous improvements and bug fixes. To learn more about Zeppelin on Amazon EMR, click here.
  • Oozie 4.2.0 – Oozie is a workflow designer and scheduler for Hadoop and Spark. This version now includes Spark and HiveServer2 actions, making it easier to incorporate Spark and Hive jobs in Oozie workflows. Also, you can create and manage your Oozie workflows using the Oozie Editor and Dashboard in Hue, an application which offers a web-UI for Hive, Pig, and Oozie. Please note that in Hue 3.7.1, you must still use Shell actions to run Spark jobs. To learn more about Oozie in Amazon EMR, click here.

Launch an Amazon EMR Cluster with Release 4.2.0 Today
To create an Amazon EMR cluster with 4.2.0, select release 4.2.0 on the Create Cluster page in the AWS Management Console, or use the release label emr-4.2.0 when creating your cluster from the AWS CLI or using a SDK with the EMR API.

Jon Fritz, Senior Product Manager

  • Now Available: Version 1.0 of the AWS SDK for Go

    by Jeff Barr | on | in Developers | Permalink
    Earlier this year, my colleague Peter Moon shared our plans to launch an AWS SDK for Go. As you will read in Peter’s guest post below, the SDK is now generally available!— Jeff;


    At AWS, we work hard to promote and serve the developer community around our products. This is one of the reasons we open-source many of our libraries and tools on GitHub, where we cherish the ability to directly communicate and collaborate with our developer customers. Of all the experiences we’ve had in the open source community, the story of how the AWS SDK for Go came about is one we particularly love to share.

    Since the day we took ownership of the project 10 months ago, community feedback and contributions have made it possible for us progress through the experimental and preview stages, and today we are excited to announce that the AWS SDK for Go is now at version 1.0 and recommended for production use. Like many of our projects, the SDK follows Semantic Versioning, which means starting from 1.0, you can upgrade the SDK within the same major version 1.x and have confidence your existing code will continue to work.

    Since the Developer Preview announcement in June, we have added a number of key improvements to the SDK, including:

    • Sessions – Easily share configuration and request handlers between clients.
    • JMESPATH support – Query and reshape complex API responses and other structures using simple expressions.
    • Paginators – Iterate over multiple pages of list-type API responses.
    • Waiters – Wait for asynchronous state changes in AWS resources.
    • Documentation – Revamped developer guide.

    Here’s a code sample that exercises some of these new features:

    Go
    // Create a session
    s := session.New(aws.NewConfig().WithRegion("us-west-2"))
    // Add a handler to print every API request for the session
    s.Handlers.Send.PushFront(func(r *request.Request) {
    	fmt.Printf("Request: %s/%s\n", r.ClientInfo.ServiceName, r.Operation)
    })
    // We want to start all instances in a VPC, so let's get their IDs first.
    ec2client := ec2.New(s)
    var instanceIDsToStart []*string
    describeInstancesInput := &ec2.DescribeInstancesInput{
    	Filters: []*ec2.Filter{
    		&ec2.Filter{
    			Name:   aws.String("vpc-id"),
    			Values: aws.StringSlice([]string{"vpc-82977de9"}),
    		},
    	},
    }
    // Use a paginator to easily iterate over multiple pages of response
    ec2client.DescribeInstancesPages(describeInstancesInput,
    	func(page *ec2.DescribeInstancesOutput, lastPage bool) bool {
    		// Use JMESPath expressions to query complex structures
    		ids, _ := awsutil.ValuesAtPath(page, "Reservations[].Instances[].InstanceId")
    		for _, id := range ids {
    			instanceIDsToStart = append(instanceIDsToStart, id.(*string))
    		}
    		return !lastPage
    	})
    // The SDK provides several utility functions for literal <--> pointer transformation
    fmt.Println("Starting:", aws.StringValueSlice(instanceIDsToStart))
    // Skipped for brevity here, but *always* handle errors in the real world 🙂
    ec2client.StartInstances(&ec2.StartInstancesInput{
    	InstanceIds: instanceIDsToStart,
    })
    // Finally, use a waiter function to wait until the instances are running
    ec2client.WaitUntilInstanceRunning(describeInstancesInput)
    fmt.Println("Instances are now running.") 
    

    We would like to again thank Coda Hale and our friends at Stripe for contributing the original code base and giving us a wonderful starting point for the AWS SDK for Go. Now that it is fully production-ready, we can’t wait to see all the innovative applications our customers will build with the SDK!

    For more information please see:

    Peter Moon, Senior Product Manager

  • AWS Device Farm Update – Test Web Apps on Mobile Devices

    by Jeff Barr | on | in AWS Device Farm | Permalink | Comments
    If you build mobile apps, you know that you have two implementation choices. You can build native or hybrid applications that compile to an executable file. You can also build applications that run within the device’s web browser.We launched the AWS Device Farm in July with support for testing native and hybrid applications on iOS and Android devices (see my post, AWS Device Farm – Test Mobile Apps on Real Devices, to learn more).

    Today we are adding support for testing browser-based applications on iOS and Android devices. Many customers have asked for this option and we are happy to be able to announce it. You can now create a single test run that spans any desired combination of supported devices and makes use of the Appium Java JUnit or Appium Java TestNG frameworks (we’ll add additional frameworks over time; please let us know what you need).

    Testing a Web App
    I tested a simple web app. It opens amazon.com and searches for the string “Kindle”. I opened the Device Farm Console and created a new project (Test Amazon Site). Then I created a new run (this was my second test, so I called it Web App Test #2):

    Then I configured the test by choosing the test type (TestNG) and uploading the tests (prepared for me by one of my colleagues):

    The file (chrome-with-screenshot.zip) contains the compiled test and the dependencies (a bunch of JAR files):

    Next, I choose the devices. I had already created a “pool” of Android devices, so I used it:

    I started the run and then checked in on it a few minutes later:

    Then I inspected the output, including screen shots, from a single test:

    Available Now
    This new functionality is available now and you can start using it today! Read the Device Farm Documentation to learn more.