In an age where hosted service outages could get viewed seriously – particularly of importance were the recent Salesforce.com outage & Google outage. One of the reasons that corporates hesitate to outsource even email services are because of possible situations like this - not that in-house infrastructure may not break down - but still the bar is always set high when infrastructure/critical apps are hosted outside. Steve Arnold provides a glimpse of Google’s infrastructure in his book the google legacy , covered from an architectural standpointDespite its criticality seldom one gets to know in details how hosted servires capable of supporting millions and millions of users are architected, hosted & maintained. Came across this ACM Queueinterview courtesy of Dare Obasanjo. In the landscape of today’s megaservices, Hotmail just might be Mount Everest. With 10,000 servers spread around the globe to process billions of e-mail transactions per day, managed just by 100 system administrators –it stand tall among mega services offered over the net. Phil explains Hotmail as a service consisting of thousands of machines and multiple petabytes of data. It executes billions of transactions over hundreds of applications agglomerated over nine years—services that are built on services that are built on services. Some of the challenges are keeping the site running: namely dealing with abuse and spam; keeping an aggressive, Internet-style pace of shipping features and functionality every three and six months; and planning how to release complex changes over a set of multiple releases. Excerpts with edits and comments:
QA is a challenge in the sense that mimicking Internet loads on our QA lab machines is a hard engineering problem. Manageability is a challenge in that you want to keep your administrative headcount flat as you scale out the number of machines.
Migrating terabytes worth of data takes a long time and involves complex capacity planning and data-center floor and power consumption issues. More up-front planning around how to go backwards if the new version fails is needed. He points out the big difference between shipping products versus shipping services is to have a real awareness of exactly what effect an error or failure is going to have on the operations team. On new hires he cautions against the tendency to want to do complex things, but we know complex things break in complex ways. The veterans want simple designs, with simple interfaces and simple constructs that are easy to understand and debug and easy to put back together after they break. The administrative mantra is to automate. From an engineering point of view, the requirement has to be to build automation and instrumentation into the service from the get-go. The reality is that managing a live site - and this is mostly because of spam and abuse - puts a ton of pressure on the development and system engineering resources. The notion of tape backups is probably no longer feasible. Building systems where we’re just backing up changes- and backing them up to cheap disks- may not be the direction in future and predicts the emergence of the use of data replicas and applying changes to those replicas, and ultimately the requirement that these replicas be disconnected and reattached over time. The problems that are unique to scaling for the internet are those of basic client-server programming - that is, figuring out the browser/http/server data-access patterns and optimizing the protocols, extending these protocols as new functionality is introduced, and ensuring that these protocols work across geo-distributed data centers when the speed of light becomes a factor. Designing applications with built-in redundancy so that they are resilient to abuse is also a challenge. He points out that at a certain point, however, the engineering cost is overwhelmed by the operational costs. For managing a megaservice, the best advice is just basically to keep everything as simple as possible—simple processes, simple SKUs, simple engineering. A megaservice consumes the best in all the related worlds - hardware, infrastructure, security, adminsitration, design, managament, budgeting, planning etc. With the world increasingly moving towards the services model – what lay beneath the interview are tons of wisdom and practical sense – a must read for all – starting from CIO’s to managers to developers to administrators.
Category :Emerging Trends, Megaservices