Yahoo! is building a set of scalable, highly-available data storage and processing services, and de-
ploying them in a cloud model to make application development and ongoing maintenance significantly
easier. In this paper we discuss the vision and requirements, as well as the components that will go into
the cloud. We highlight the challenges and research questions that arise from trying to build a com-
prehensive web-scale cloud infrastructure, emphasizing data storage and processing capabilities. (The
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.