Lei feng's network: this is Lei feng's network (search for "Lei feng's network", public interest) exclusive writing articles authors tieliu. Guan Shenwei and Super is attached at the end of the article, can be used for reference.
(Figure from Xinhua)
According to Xinhua News Agency, June 20, the new issue of global top 500 supercomputers list published on 20th using the independent chip-making "shenwei Taihu" instead of "Tianhe 2nd" topped. More valuable is that Kamui Taihu Tianhe 2nd light and use different Intel CPU, Kamui of Taihu Lake by a completely independent research and development of many-core chips – Wayne Shen made 26010. And high speed Internet and the operating system are also developed.
This not only completely realized in the field of ultra is completely reversed in technology and information relinquished on the security situation, also United States Super ICC ban of four Intel Xeon PHI computing card become a laughingstock, and once again to achieve a "foreign technology, are made by Chinese people". (Can be seen as networks of Lei Feng articles to the United States after the sale of Xeon chips, China is experiencing what? 》)
| Wayne Shen used instruction set
Optical uses Shanghai shenwei Taihu high performance ASIC design design made of many-core chips Wayne Shen 26010, the many-core chips 28nm process technology, frequency 1.45G, has 260 core, double precision floating point peaks as high as 3.06TFlops, in double precision floating point Super is completely tied on Intel's best chips. Thanks to many-core chips made Wayne Shen 26010 's strong performance, coupled with good architecture and design as well as the Internet and other core parts, make the count with extraordinary performance.
Admittedly, when introducing Wayne Shen 26010 advantages, cannot ignore the short-only 136.51G-memory and DDR3, by contrast, KNL Intel and NVIDIA Tesla has adopts 3D stacked memory, memory bandwidth is 512G (Intel PHI) and 720G (NVIDIA Tesla). Because the single CPU memory bandwidth is not big, so for real-life applications, it is difficult to run close to the peak of performance. For some applications of high memory bandwidth requirements, as they do in the actual use PHI and Tesla.
Every time when China has made technical breakthroughs, the network will come up with a group of "Daddy party", this super Kamui Taihu is brush list is no exception – some Wayne Shen 26010 and linked to the DEC Alpha and its "acceptance of Dad." Matter-of-fact, Wayne Shen is strongly associated with the Alpha, but ties are very weak, and DEC Alpha we had had at completely different things (after all, been almost 18 years since Compaq acquired DEC), it was called class Alpha independent instruction set, the author contacted scientists at Wayne Shen, he made it clear that it is independent instruction set. This Conference, the relevant units clearly indicate that the Shenwei-64 Instruction Set (this is NOT related to the DEC Alpha instruction set). The majority of users don't give Wayne Shen for "father", not to mention the "Dad" has been hanging for 18 years.
| Unique design concept of Wayne Shen 26010
In today's hyper-or CPU+ Accelerator approach is compute nodes, either using the same CPU.
(Figure from Xinhua)
With the CPU+ Accelerator, called heterogeneous computing.
For example for, United States Thai Tan and China Tianhe 2nd, for cases, Thai Tan has 18,688 a operation node, each operation node by one 16 core AMD Opteron 6274 processor and one NVIDIA Tesla K20 Accelerator composition, amounted to 299,008 a operation core; Tianhe 2nd, has 16,000 a calculation node, each node by 2 tablets Intel of E5 2692 and 3 tablets Xeon PHI composition, total using has 32000 tablets Intel of E5 2692 and 48000 Xeon PHI; the milky way 1 a use 14336 Intel Xeon X5670 processor and 7168 NVIDIA Tesla M2050 HPC cards. These super computing nodes using the CPU+ Accelerator, which is super is a typical representative of heterogeneous computing.
While fully using the same CPU known as homogeneous computing. Incipio iPhone 6 Plus
For example, Japan super is "Beijing" only in the processor is manufactured by Fujitsu SPARC64 VIIIfx, Shen Wei 1600,Mira Kamui Blu-ray only 8704 and Sequoia, only PowerPC A2 processor is used, which is not used or many-core GPU accelerators such as chips. Japan Beijing, IBM for Mira and Sequoia and Blu-ray are isomorphic over China's great power is representative.
Wayne Shen 26010 and Kamui of Taihu Lake are rather special, if you use the same type instruction set and architecture of cells that make up the system of calculation to define an isomorphism, then, because of light using only Kamui Taihu Wayne Shen 26010, may be considered homogeneous computing. But actually, Kamui Taihu Lake of light double fine floating-point peak up to 125PFlops, stability can for 93PFlops does is used Accelerator only made of high performance--essentially, Wayne Shen 26010 is will CPU and accelerator combined--Wayne Shen 26010 of 260 a core is divided into 2 species, a is management core, play similar CPU of function, another a is operation core, play similar accelerator of role, this on makes Wayne Shen 26010 single chip can completed Intel E5+PHI, or Power+Tesla functions of the two products.
But relative to Intel E5+PHI, or Power+Tesla, Wayne Shen 26010 to shared memory, which avoids the Intel E5+PHI or Power+Tesla must be explicitly copied, thus reducing the pressure on memory, and reduces the loss of performance. Presumably is so, Wayne Shen 26010 of cache and memory are is partial small, because visit save model may very simple--is equal to is gave up existing CPU of complex memory management model, put memory scheduling of task completely handed developers, only in CPU support a most simple of visit save model, in hardware Shang no cache of hardware consistency requirements (Intel KNL will Cache consistency referred to the hardware is responsible for), will synchronization of work handed software. This extraordinary design Wayne Shen 26010 with high performance and low power consumption at the same time, it makes up for in memory on a short Board.
| How strong is the power light of Taihu Lake?
Light refreshing Kamui Taihu TOP500 list, relied on its powerful double precision floating point double precision floating point performance – up to 125PFlops peak and 93PFlops stability, earth-shaking.
In fact, in addition to superior double precision floating point performance, the Kamui of Taihu Lake also has a high efficiency, low power consumption, high performance per watt ratio, low volume and a series of advantages.
High performance--Kamui Taihu Lake of light double fine floating-point peak up to 125PFlops, stability can for 93PFlops, compared than Xia, United States Super is Thai Tan of double fine floating-point peak up to 27 Pflops, stability can for 17.6 PFlops, Tianhe 2nd, of double fine floating-point peak up to 54.9Pflops, stability can for 30.65PFlops, thus, Kamui Taihu Lake of light in stable performance is United States Super is Thai Tan 5.2 times times.
Efficient--Kamui Taihu Lake of light machine efficiency up to 74.16%, compared than Xia, United States Super is Thai Tan of machine efficiency for 65.19%, and river 2nd, of machine efficiency for 55.83%, due to Super is performance more strong, scale more big, machine efficiency upgrade on more difficult Kamui Taihu Lake of light in stable performance is United States Super is Thai Tan 5.2 times times of situation Xia, machine efficiency still sharply is better than Thai Tan, machine efficiency of high was was horror.
Low power-the power consumption of 15.3 in Taihu Lake MW, United States ultra-a Titan power 9MW, Tianhe 2nd 17.8 MW, it can be said that power can reach the milky way light stability of Taihu Lake, 2nd level 3 times, but the power consumption is lower than that of the milky way, 2nd.
Performance power ratio--Kamui Taihu Lake of light of performance power than up to 6G/W, compared with, TOP500 Super is list Shang of competition opponents are dwarfs--Tianhe 2nd, of machine performance power than for 1.95G/W, United States Thai Tan Super is of performance power than for 2.143G/W, United States Super is Sequoia machine performance power than for 2.069G/W, Japan super is "Beijing" machine performance power than for 0.830/W...... Even global Green500 list, Kamui of Taihu Lake will get to third place. The Green500 list first and second Super is only in the low-power Intel E5, performance is very weak, and even use NVIDIA K80 Accelerator Ultra, and its performance per watt and only 4.7G/W. Therefore, the power light on the performance/watt of Taihu Lake is especially dazzling.
Small-volume Cabinet-Kamui Taihu covers an area of 605 square, United States Super is Titan Cabinet covers an area of 404 square, Tianhe 2nd Cabinet covers an area of 720 square meters.
| Conclusion
In core components such as CPU, operating system, Internet name after independence, some "Super work performance surplus on the" blame bright Super Kamui Taihu is excess properties, is a vanity project, useless.
"Super count property surplus theory", in my opinion, never stop pursuing the performance, code can be modified for the calculation accuracy, if there are better conditions the user will naturally improve the grid density or number of particles, slightly modified to make precision, high precision and can be used to solve the deeper problems. So do excess performance ultra is not how high performance, ultra is not enough.
Just as the Olympic motto "faster, higher, stronger" Super is also continuing to pursue more quickly.
Lei Feng network Note: author tieliu, micro-signal tieliu1988. Reproduced, please contact Lei Feng network authorization, shall not modify the article.
"Recommended reading"
1, the United States banned after Xeon chips, China is experiencing what?
2, "Tianhe second" why back-to-back Super a Championship? This possible thanks to United States ban Incipio iPhone 6 Case
3, compared to Godson, Wayne Shen for "core" course of why even a lot?
No comments:
Post a Comment