According to the survey report of cloud native users in China in 2020, more than 60% of users have applied container technology in the production environment, nearly 80% of users need to meet the production demand of 1000 or more nodes, more than 13% of users have more than 5000 nodes, and 9% of users have more than 10000 nodes. With the further popularization of cloud native technology, more and more enterprises switch their core business to containers. The scale of container clusters in enterprise production environment shows an explosive growth trend, and the large-scale implementation of containers has become a required course for enterprise development. At present, the open source version of kubernetes can support up to 5000 nodes and 150000 pods, which can no longer meet the growing business needs.
What difficulties should the large-scale container enterprises face
Large scale container cluster can provide more business load capacity, higher traffic burst ability and more efficient cluster management mode. As a practitioner and leader in the cloud native field, Alibaba cloud has taken the lead in achieving a scale breakthrough of 10000 nodes and 1 million pod in a single cluster. Compared with the community version of kubernetes, the number of nodes in a single cluster has increased by 2 times and the number of pod has increased by 6.7 times. Based on the experience of serving millions of customers, Alibaba cloud has developed a four step approach to container large-scale landing, which can help enterprises overcome the difficulties in the process of large-scale container landing and easily cope with the increasing demand for scale.
When enterprises are faced with business or it demands such as sudden flow business, complex computing business, and need to further improve the operation and maintenance efficiency, the capacity of single cluster has become the bottleneck of development. For example, gene computing, online seckill and other services will generate a lot of load in a short time, which poses a serious challenge to the computing resources that a single cluster can accommodate. It is urgent for a single cluster to support large-scale nodes to run pod in batch. Based on this, enterprises will start to consider the expansion of clusters. However, the pursuit of large-scale clusters is not an all-purpose silver bullet. Enterprises need to optimize the cluster capacity and realize business value according to their own business development characteristics. Blindly pursuing cluster scale will expand the risk of the whole fault area.
The second step: the container scale is not simply to expand the size of the size, how to achieve a whole set of system optimization from bottom to top, and get through the two veins of the governor and the governor?
Kubernetes, as an operating system in the cloud native era, and its deployed cloud environment is very complex and huge. Therefore, the container scale is a complete set of optimization system from the bottom layer cloud resources to the upper layer applications. Enterprise users need to focus on three aspects of Optimization: 1. Break the restrictions on cloud resource quota at the level of cloud products; 2. Improve the ceiling of resource scale at the level of cluster components; 3. Optimize the cluster configuration strategy at the resource level of kubernetes to ensure the scale capacity of resources.
After the container cluster scale is magnified by N times, the performance of storage, cluster network and application distribution will be greatly challenged. For example, the network traffic in the large-scale cluster data center is usually large, and the network delay and jitter will also be enlarged, which will affect the transmission efficiency and stability of the cluster network. In addition, in the conventional scenario of batch publishing and updating applications in large-scale clusters, the instantaneous image pull of 1W nodes will have a huge network impact, which brings great pressure to the image service and network bandwidth. The original intention of large-scale container is to provide more powerful technical support, not only to ensure the original performance, but also to further improve the overall performance. Enterprise users can focus on four aspects to optimize: node & pod scale efficiency, network efficiency (throughput and delay), DNS resolution efficiency, and image acceleration.
Step 4: the most exciting difficulty after the container scale is stability
Alibaba cloud helps enterprises realize container scale landing in one stop mode
In view of the difficulties of large-scale cluster landing in enterprises, Alibaba cloud provides enterprise level container cluster management capability based on ackpro, and provides a lot of performance optimization on apiserver and scheduler, breaking resource scale limit, improving performance ceiling and ensuring cluster stability. By researching the high-performance container network Terway, we optimize the pod delay by 30%, reduce the performance cost of large-scale service, which can not only solve the network bottleneck problem of large-scale cluster, but also provide almost the original network performance on the cloud, which makes the cluster respond more quickly. The enterprise mirror warehouse reach supports exclusive storage, provides the ability to load the mirror on demand, reduces the start time by 60%, and solves the problem of slow pulling image from large-scale nodes. Integrating Alibaba cloud storage, network and security capabilities, Alibaba cloud provides the best performance of container scale operation for enterprises in one-stop mode: more efficient network forwarding, more scalable storage, more efficient application and mirror distribution, and more stable and secure large-scale cluster management.
It is worth mentioning that Alibaba cloud became the first cloud service provider to pass the large-scale performance test of ICT container in the recent 2020 cloud native industry conference, and obtained the highest level of certification - excellence. In the container scale evaluation of the ICT Institute, the full load pressure test, network delay, network performance loss and other evaluation results of Alibaba cloud container service are far ahead of the manufacturers participating in the evaluation.
Based on this, Alibaba cloud has enough flexible service capability space, and can customize container cluster services to meet the current needs according to enterprise business. In addition to supporting Alibaba Groups internal core system containerization of shangyun and Alibaba clouds cloud products, it also exports many years of large-scale container technology to many eco companies and ISV companies around the double 11 with the ability of productization u3002 By supporting container clouds from all walks of life around the world, Alibaba cloud container service has accumulated cloud native application hosting middle platform capabilities that support cellular architecture, global architecture and flexible architecture, manage more than 10000 container clusters, and provide enterprise level reliable services.
Alibaba cloud has the largest container cluster in China, the richest cloud native product family and the most comprehensive open source contribution. It provides more than 100 innovative products including cloud native bare metal server, cloud native database, data warehouse, data lake, container, micro service, Devops, serverless, etc., covering new retail, government affairs, medical treatment, transportation, education and other fields. Alibaba cloud container service is the only manufacturer in China that has been selected into Gartners competition pattern: public cloud container services report for two consecutive times. Alibaba cloud covers nine product capabilities, including serverless kubernetes, service grid and container image, which is in line with AWS, and leads Google, Microsoft, IBM and Oracle in product richness.
With the gradual popularization of the container technology, how to evaluate the performance of the container has become a topic of general concern in the industry. In view of the industry pain point, the performance evaluation results of the first super large-scale container released by China Institute of information and communication objectively and truly reflect the performance of container cluster component level. At the 2020 cloud native industry conference, Ding Yu, Alibaba cloud researcher and Alibaba cloud native technology director, said, Alibaba cloud has been committed to promoting the popularization of cloud native in China, and will work with ICT to promote the standardization and standardization development of Chinas container market.