"The datacenter is the computer." That was the provocative title of a talk given by Dr. Luiz Barroso, a distinguished engineer at Google, at the UC Berkeley Reliable Adaptive Distributed Systems Lab (RAD Lab). To program an application such as Gmail or Google Search, which runs on hundreds or thousands of machines, is to program a datacenter. The RAD Lab's mission is to develop the technology to make it possible for a single person with a great new application idea to do just that, without first having to build a Google-sized company around it to do the engineering.
A key enabler of this vision is the recent emergence of cloud computing — the ability to pay-as-you-go for computing, storage, and networking hosted in someone else's datacenter. For example, Amazon Web Services (AWS) allows you to rent servers for 10 cents an hour and storage for 15 cents per gigabyte per month, with no minimum, no maximum, and no up-front payment.
Cloud computing in the classroom
Following Berkeley's long tradition of integrating computer science research into education, we piloted a software project course in 2007 focused exclusively on developing and deploying software-as-a-service (SaaS) applications. In fall 2008, with a generous donation of AWS credits from Amazon, we moved this course from Berkeley-owned infrastructure to the cloud.
One reason for the move was to give undergraduates exposure to cloud-computing tools and technologies because we believe these skills will be in demand. But we also found that cloud computing made it easy to create realistic assignments such as having students saturate a large database server. With a typical small application, it takes 8 to 10 servers to do this, so we would have needed to commandeer 200 servers (40 students, working in pairs) to allow each team to do their own measurements. With cloud computing, we acquired the 200 servers in a few minutes, and released them after the lab was turned in a few days later. Similarly, early in the course students did development on their laptops, but toward the end they needed to deploy and demonstrate their working applications. This short-term surge in demand for servers at the end of the semester was a perfect fit for cloud computing, as we discuss in our technical report Above the Clouds: A Berkeley View of Cloud Computing (see sidebar).
Because cloud computing gives the illusion of near-infinite resources available on demand, we were able to drive home the important lesson that horizontal scalability, not single-node performance, is the critical design goal for datacenter-scale applications. It would be more difficult to argue this point while simultaneously telling students they are limited to using a single server (or less) due to limited departmental resources. And when we teach students about managing redundancy for scalability and high availability, the students can work hands-on with all the "moving parts" — load balancers, web server front ends, etc. In short, cloud computing allows students a datacenter-like experience without building or managing a datacenter.
Courseware management was made easier by AWS's pervasive use of virtual machine technology: we created a single virtual machine image containing the whole software stack, and each student or team could deploy that image on an EC2 server instance and instantly have the same experience as if they themselves were administering the application server in a datacenter. We would never grant undergraduates root access on a shared Berkeley server, but with EC2 they can have root access on their own image, and any damage they do can be undone by simply reinstantiating the image on a new server.
Students reported that AWS was no harder to use than Berkeley-owned equipment, and since AWS has an active developer community, its question boards, blogs, and documentation are far more comprehensive than what limited course staff could provide.
The course has been so successful that some student projects continued to have a life of their own after the course ended. For example, WeJoinIn.com, which coordinates teams of volunteers to staff an activity and was used to organize the ASUC's voter registration drive in 2008, began as a project in our course. Cloud computing positions projects perfectly for such a transition: the most popular projects can scale up on demand. If this seems an unlikely scenario, remember that the initial prototype of eBay was created over a long weekend by founder Pierre Omidyar.
Cloud computing isn't for every course or topic; we are still exploring how to map scientific computing onto a cloud computing environment, for example. But given the RAD Lab's research focus — allowing scale-up of such paradigm-changing services to be done by a single person — cloud computing is an important enabler, and by using it in our courses as well, we can deploy our research tools to Berkeley students and simultaneously keep them on the cutting edge of real-world experience.
We've found cloud computing to simplify economics, courseware management, and provisioning resources for our classes, and we encourage others to investigate it. Amazon is preparing to announce a program to support teaching and research projects through donations of AWS usage credits; Kurt Messersmith, can be contacted for details.
Of course in practice, many factors affect a decision of whether to move to the cloud, including economics, technical considerations, and data privacy and auditability. We discuss these at length in our technical report, Above the Clouds: A Berkeley View of Cloud Computing [PDF].
The RAD Lab's recent technical report, Above the Clouds: A Berkeley View of Cloud Computing, has been widely disseminated among the trade press, academia and industry. It outlines what we believe are the most significant new aspects of IT affected by cloud computing, and the top opportunities and obstacles to the adoption and long-term viability of cloud computing. It is available as a PDF at http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.