one、Characteristics of large website systems

High concurrency,Large Flow: Need to face high-concurrent users,Large traffic access,GoogleDay averagePVCount35100 million,Day averageIPAccess number is3100 million,2011TencentQQMaximum user number1.4100 million,Taobao2012Year-old11Activity day trading volume191100 million,The event begins the first minute to access users independently1000Ten thousand

expand: PV、UV、IVthe concept of

PV: PVViews(Page View),That is, page access,Every open pagePVcount+1,Refreshing the page is also。

UV: UVAccess number(Unique Visitor)Refers to the number of independent visits,One computer terminal is a visitors。

IV: IVYesIPAccess Number IndependenceIPAccess number,Calculation is based on a separateIPAccessing website is calculated in a calculation period1Second-rateIPAccess number。Regardless of this in the same calculation periodIPHow many times is calculated as1Second-rate。Calculate time1Day is a calculation period,Also1An hour is a calculation period。

Useful: 7 * 24Hour uninterrupted service。The downtime of large websites will generally become focus,E.g2010Year Baidu domain name is hijacked incident,pair11Taobao downtime event,12306Site concurrent downtime event,Weibo traffic star downtime event。

Massive Data: Need to store data from massive volume,At the same time, a large number of servers,Facebook The number of photos uploaded per week is close10100 million,Baidu included website has 10 billion,GoogleThere are millions of servers to serve global users.。

User distribution,Network situation is complex:Many large Internet companies are serving a global user,The network situation of various localities is very different.,domestic,There are also problems of various operators' network interchange.,Sino-US fiber cable number fault,Many Internet companies have to consider establishing data centers overseas.。

Safe environment is bad: Due to the openness of the Internet,Make large Internet companies more vibrant attacks,E.gfacebookUser leak event。

Quick change,Release frequently: Different from traditional enterprise applications,Internet companies adapt to the market quickly,Meet user needs,Its product release frequency is extremely high。As for the release frequency of small and medium-sized Internet companies,That's higher,Sometimes I will release more than a dozen times a day.

Progressive development: Different demand for all functions and non-functional needs with traditional industries,Many large Internet companies have started from small companies.,Progressive development。FacebookThe founder Zachburk has been developed in Harvard's dormitory,Alibaba is born in the living room of Ma Yunjia.,Good Internet products are iterated,Not a good start, it is very good.。

two、Evolution process of large websites

1. Website architecture in the initial stage

Large websites have developed from small websites,The website architecture is also the same,The website started to build a prototype stage,Visit small,One server is complete enough,Also the choice of most enterprise applications

application,database,Documents are deployed in one server,Usually serversLinux,Application selectionPHP,Then deployedApache superior,DatabaseMySQL,Collect all kinds of open source software and a cheap server can be developed

2. Application services and data separation

With the development of business,One server does not meet business needs:More and more user access has caused performance that is getting worse.,More and more data leads to insufficient server storage space,This requires separation of applications and data。The entire website uses three servers after application and data separation,application server,file server,Database server

After the application and data separation,Different characteristics of servers bear different service roles,The concurrent processing capabilities of the website and the data storage have been greatly improved.,Support for further development。But with the increase of business,Visual increase,The website is another challenge again,Database pressure is too much caused by access delay,Further influence the performance of the entire website,User experience

3. Use cache to improve site performance

Website access characteristics and real world wealth allocation usually meets the law of the second eight: 80%Business access is concentrated20% Data,Taobao buyers browsed products are concentrated in a small number of transactions、Good evaluation;Baidu search keywords are concentrated in a small part of the popular vocabulary,Searching entry You will also browse the top two pages。

Since most of the browsing will only be placed on a small part of the data,So can you cache these data into memory?,Is it possible to reduce the pressure of the database,Thereby increasing data access speed throughout the website,Improve the write performance of the database??

Site cache is divided intoTwo types: SlowLocal serverLocal cache,SlowRemote serverRemote cache,Local cache access speed is faster than remote cache。But by the memory limit of the application server,There will often be a local cache and application competition memory.,This situation isRemote serverThere will be no existence,Remote distributed cache can use the cluster,Deploy big memory server uses a dedicated cache server,In theory, you can do a cache service that is not subject to memory.

After using the cache,Effectively improve the pressure of database access,However, the single application's server can handle the limited connection.,During the peak period of website access,Application Server will become a bottleneck of the website

4. Improve the concurrency process of the website using the application server cluster

Using clusters is to solve high concurrency,Matro data problem key means,When the processing power of a server、When the storage space is insufficient,Don't try to replace a larger server,But considerClusterdeploy,Because for large sites,No matter how powerful server,I don't have a business that does not grow up.。

Application Server Cluster Deployment,Make load control using load balancing servers,Ability to improve the pressure of the flow peak on the application server,Avoid a single server to bear more request pressure。If there are more requests,So add more application servers on the existing basis。

5. Database read writing separation

After using the cache,Ability to improve part of database access pressure,Missing most data read operations can be completed without passing through the database,But there is still part of reading(Cache Access is not hit,Cache expired)And all write operations will directly access the database,After the website reaches a certain scale,It also increases the pressure of the database

At present, most of the mainstream databases provide master from hot standby features.,Construction by configuring two databasesMaster-slaverelation,You can put data from a database serverUpdate synchronizationTo another server,Website can use this feature,Implement databaseRead and writeFunction,Thereby improving the pressure of the database。

Application when writing data,Access the primary server,In reading data,Access from the server,The primary database updates the data to the server from the server through the master from the master-slave copy.,This is when there is a read operation,Access directly from the server directly,When there is a write operation,Will access the primary server directly,In order to facilitate the application to access the database after reading and writing,Usually use a dedicated data access module in the application server side,Make database readline separation to application transparency

6. Use reverse proxy andCDNAccelerate website response

As the scale of the website is constantly expanding,User size is getting bigger and bigger,Due to complex domestic network,Users in different regions When visiting the site,Speed difference is also great。Research has shown that,Website access delay and user traffic rate positive,The slower the website accesses,The faster the user's lost rate,So in order to better user experience,Retain users,Website requires faster access speed,Main means to useCDNAnd reverse agent

CDNThe basic principles of the reverse agent are cache,The difference is thatCDNMachine room deploying online provider,User occupation is when requesting website service,CanRecentlyComputer room for data;andCDNThe central computer room is deployed on the website.,When the user requests to reach the central computer room,The first access server isReverse proxy server,If the reverse proxy server caches resources,Will return directly to the user

useCDNThe purpose is to return data as soon as possible to users.,On the one hand, you can speed up your access speed.,On the one hand, you can mitigate the pressure of the server.

7. Use distributed file systems and distributed database systems

Any powerful single server meets business needs that are not growing in large sites.。After the database is read and written,Split from one server into two servers,But with the development of the website business that cannot meet the needs of business,Need to useDistributed storage server,File system is also the same,Distributed file system。

Distributed database is an important means for website database split,Only use the single table data is very large。Be no longer,The most common database split metrics of the website is a business branch。Deploy different services on different servers。

8. useNoSQLSearch engine

With the expansion of website business,The requirements for data storage and retrieval are getting higher and higher.,Website needs to use some non-relational database technology such asNoSQLAnd non-database query technology and search engines

NoSQLAnd search engines are derived from the Internet,Have better support for retractable distributed features。Application Server Accesses Various Data by a Unified Data Access Module。Reduce application management of many data sources

9. Business split

Large site in order to respond to increasingly complicated business scenes,By using a divided manner, unparalleled the business of the entire website into a different product line.,If a large shopping trading website will be home、Shop、Order、Buyer、Seller and other split into different product lines,Independent business units are responsible for management。

Specifically to the technical,Will also divide the product according to product line,Spread a product into different applications。Each application is deployed and maintained independently,Applications can pass through hyperlink resume relationships,Can also passmessage queueData distribution,Of course, most of them constitute an associated complete system by accessing the same data storage system.。

10. Distributed service

As business splits are getting smaller,Storage system is getting bigger,Index growth in the overall complexity of the application system,Deployment is getting more and more difficult。Because all applications are connected to the database system。In the website of tens of thousands of servers,The number of these connections is the square of the server size。Guide the deposit database connection resources。Denial of service。

Since every application system needs to perform many of the same business operations,CanidenticalBusiness extraction。Contact database from these service connection databases。Provide public business services。

Large website evolved here,Basically most of the technical issues can be solved,Real-time data synchronization with data centers and specific website services can be solved by combining existing technical architectures.。

three、Value of large site architecture evolution

1. The core value of large website architecture technology is flexible to deal with the website.

Large siteThe core value of architecture technology is notNoBuild a large website,But it is possible to accompany the gradual development of small website business,Slowly evolve into a large website。During this process,No need to give up what,No need to overthrow,Technical selection is very important,All major companies such asFaceBook、Google、Taobao is not followed by such a development route

2. The main force of driving large website technology is the development of site business

Innovative business development model puts higher requirements for website architecture,The innovative website architecture has developed mature。It is the business achievement technology,It is a business achievement.。Not the opposite。

Four、Website architecture design misunderstanding

(Website Development Architecture)1. Blindly follow the company's solution

2. For technology

3.Attempt to solve all problems with technology


