High End Website is a part of our lives
In 2004, when we worked on a free web analytics which was combined with click fraud detection for Adwords (the era when Google analytics wasn't around), we struggled with scalability. We installed multiple servers, divided the website into multi subdomains going to different servers (kind of load balancers), caches for logged out users and many other things to handle millions of hits a day. We really struggled, we learned a lot about big data websites.
It is not 2004 anymore, thankfully
As a developer when we look back, we are glad, its not 2004 anymore, now the options are plenty. You have highly scaleable cloud options, you have web servers for specific reasons, you have node.js for notification (no need to do polling regularly), even DBs are many options (the nosqls, mongos, cassendras and so on). Now there are so many options that one will get confused what to use and what not to. Also being in the developer community, it has become a fashion to use the latest new technology irrespective of the logical fit. Recently we had to roll back a high end project to a simpler version because of financial constraints (we failed to foresee the challenges with silverlight community, now we are rebuilding it).
The concepts are same, the implementation needs to be logical
The right scalable system is one which increases capacity linearly when supported with additional hardware. In a high scalable system, if you have one machine and add another, your capacity would double. If you had X more, your capacity would increase by X x100%. This is also called horizontal scalability, the most sort after scalability.
- (Load balancer) User request, it goes to Elastic Load balancer (Traffic to the DNS name provided by the Elastic Load Balancer is automatically distributed across your load balanced, healthy Amazon EC2 instance)
- (Web Server + Web Server Scalability) At run time the instances are created from the main Web APP Server (EC2 Works here, Elastic Computation Cloud).
- (Storage system) In case of Static content, it resides in S3 or any other storage system. In case if there are videos which are hardly used, we can shift back to our own servers or low cost servers to cut the costing down. Even we are looking at Indian Datacenters which are good like http://www.ctrls.in/ to cut the cost down.
- (Database) Even with Database all the operations which are read will be scaled across multiple DBs, for the write we can divide it into operations like comments can go to a different DB, Login Logs can go different DB etc. Our preference will be PostgreSQL/MySql (shall discuss about it)
- (SQS) We will put all the tasks as async and add to the task list which will then be managed by the task master and use the right servers.
- (CDNs) It can be scaled by with CloudFront Distribution (For Videos, we feel Akamai will be better).
For any high level websites we need:
- Load balancer (it manages the load and distributes it)
- Amazon EC2 + S3 + CloudFront (It scales the server as and when needed) – There can be alternative here.
- Functionally sharded MySQL DB + x read slaves
- X Memcache Nodes (To put the high level operation to speed up
- X Task Routers + Y Task Processors
Want to read more:
Many who come to this page are like us, who are passionate about reading more about high scalability stuff, its more sexier than Megan Fox, recently we counducted
- http://lethain.com/introduction-to-architecting-systems-for-scale/
- http://highscalability.com
- http://www.stevesouders.com/ - For front end scalability discussions
- We will keep adding here. Please share your best website to read, we are keeping the comments on here.