AWS Developer Tools Blog

S3 workflows simplified with Java 8 streams

Of the many changes brought about with Java 8, the Stream API is perhaps one of the most exciting.  Java 8 streams, which are unrelated to Java’s I/O streams, allow you to perform a series of mutations and transformations against a collection of items.  You can think of a stream as a form of data pipeline, where a collection of data is passed as input and a series of defined steps are performed against that data.  Streams can produce a result in the form of a new collection or directly perform actions against each element of the stream.  Streams can be directly created from multiple sources, including directly specified values, from a collection, or from a Spliterator using a utility method.

The following are some very simple examples of how streams can be used with the Amazon S3 Java client.

Creating a Stream from results

Iterable<S3ObjectSummary> objectSummaries = S3Objects.inBucket(s3Client, "myBucket");
Stream<S3ObjectSummary> objectStream = StreamSupport.stream(objectSummaries.spliterator(), false);

We first make a call through the S3 client to grab a paginated Iterable of result object summaries from the objects in a bucket.  This transparently handles iteration across multiple pages by making additional calls to the service, as needed, to retrieve subsequent result pages.  Now it’s time to create a stream to process our results.  Although Java 8 does not provide a direct way to generate a stream from an Iterable, it does provide a utility class (StreamSupport) with methods to help you do this.  We’re able to use this to pass in a Spliterator (also new to Java 8, it helps facilitate parallelized iteration) grabbed off the Iterable to generate a stream.

Finding the total size of all objects in a bucket

This is a simple example of how using Java 8 streams can reduce the verbosity of an operation.  It’s not uncommon to want to compute the total size of all objects in a bucket and historically one might iterate through the results and keep a running tally of cumulative sizes of each object.

long totalBucketSize = 0L;
for (S3ObjectSummary summary : objectSummaries) {
    totalSize += summary.getSize();
}

Using a stream gives you a neat alternative that does the same thing.

long totalBucketSize = objectStream.mapToLong(obj -> obj.getSize()).sum();

Calling mapToLong on our stream produces a LongStream generated from the results of applying a function (in this case, one that simply grabs the object size from each summary) which allows us to perform subsequent stream operations.  Calling sum (which is a stream terminal reduction operation) returns the sum of all elements of the stream.

Delete all bucket objects older than a specified date

You might regularly run a job that goes through the objects in a bucket and deletes those that were last modified before some date.  Again, streams allow us to perform this operation concisely.  Here we’ll say that we want to delete any objects that were last modified over 30 days ago.

Calendar c = Calendar.getInstance();
c.add(Calendar.DAY_OF_MONTH, -30);
Date cutoffDate = c.getTime();

objectStream.filter(obj -> obj.getLastModified().before(cutoffDate))
    .forEach(obj -> s3Client.deleteObject("myBucket", obj.getKey()));

First we generate our target cutoff date.  In this example we call filter on our stream to filter the stream elements down to those matching our condition.  At that point calling forEach (which itself is a stream terminal operation) executes a function against the remaining stream elements.  In this case it makes a calls to the S3 client to delete each object.

This could also be easily modified to simply return a List of these old objects to pass around.

List<S3ObjectSummary> oldObjects = objectStream
			.filter(obj -> obj.getLastModified().before(cutoffDate))
			.collect(Collectors.toList());

Conclusion

I hope these simple examples give you some ideas for using streams in your application.  Are you using Java 8 streams with the AWS SDK for Java?  Let us know in the comments!