Posts tagged ‘aws’
Amazon announces Elastic MapReduce »
Amazon Web Services have launched Elastic MapReduce, which is a cloud computing service for on-demand data processing. You’ve been able to do this at Amazon before by running Hadoop on EC2 instances, but this looks to wrap it all up in a convenient product, and make the dynamic scaling easier.
Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).
Using Amazon Elastic MapReduce, you can instantly provision as much or as little capacity as you like to perform data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. Amazon Elastic MapReduce lets you focus on crunching or analyzing your data without having to worry about time-consuming set-up, management or tuning of Hadoop clusters or the compute capacity upon which they sit.
Languages supported: Java, Ruby, Perl, Python, PHP, R, and C++.
MapReduce: see Google’s whitepaper, and Wikipedia
Amazon adds sort feature to SimpleDB »
Amazon AWS SimpleDB now supports sortable query result sets. Previously query results came back in insertion order only, but now you can sort on (only) one attribute. This makes a lot of standard relational DB use-cases more feasible for implementation in SimpleDB, as it makes for less data post-processing.
Sorting on only one attribute is still quite limiting, though, and queries still only return object IDs, which forces many further queries to retrieve the full data-set.
Amazon adds persistent storage to EC2 »
Amazon is adding persistent storage as an option to EC2 — currently it’s in private beta.
Previously, disk storage on an EC2 was transient:- when the machine was shut down or crashed, it felt like a hard drive crash. (And you’d lose your IP address too, but Amazon added static IPs a little while ago too.) The path to reliability was to use S3, but that can’t be mounted as a native file system.
The persistent storage appears as a raw, mountable filesystem that needs to be formatted. You’ll be able to make a quick snapshot of the data, for backup. No word on pricing or its performance, but you’d expect it to be aligned with S3.
There’s been the option of mounting S3 in EC2 using davfs, which mounts with WebDAV, but that’s a bit of a hack and one wonders what the performance would be like.
Amazon adds static IP addresses to EC2 »
This is great news: Amazon EC2 now lets you reserve static IP addresses, and allocate them to your instances. Previously, IP addresses were dynamic — if you shutdown an instance, or it crashed, the IP that it had was lost: it would go back into the general allocation pool.
This makes EC2 much more viable for running public web sites, because now you can set up a load balancer on a static IP, and not have to worry about dynamic DNS, and clients that ignore TTLs.