What kind of chip does ChatGPT need?

Recently, the generative model headed by ChatGPT has become a new hotspot of artificial intelligence. Microsoft and Google in Silicon Valley have invested heavily in such technologies (Microsoft invested $10 billion in OpenAI behind ChatGPT, and Google also recently released a self-developed BARD model), and Internet technology companies such as Baidu in China have also expressed that they are developing such technologies and will be launched in the near future.

The generative models represented by ChatGPT have a common feature, that is, they use massive data for pre-training, and are often paired with a relatively powerful language model. The main function of the language model is to learn from a large number of existing corpora. After learning, it can understand the user’s language instructions, or further generate relevant text output according to the user’s instructions.

The language generation model

Generative models can be roughly divided into two categories, one is language-based generative models, and the other is image-based generative models. The language generation model is represented by ChatGPT. As mentioned above, its language model can not only learn to understand the meaning of user instructions, but also generate relevant text according to user instructions after training with massive data. This means that ChatGPT needs to have a large enough LLM (Large Language Model) to understand the user’s language and have high-quality language output – for example, the model must be able to understand how to generate poetry, how to generate poetry, etc. . This also means that large language models in language-based generative AI need very many parameters to complete this type of complex learning and remember so much information. Taking ChatGPT as an example, its parameter volume is as high as 175 billion (using standard floating-point numbers will take up 700GB of storage space), and its language model is “big”.

image generation model

Another type of generation model is the image generation model represented by Diffusion. Typical models include Dalle from OpenAI, ImaGen from Google, and the most popular Stable Diffusion from Runway AI. This type of image generation model also uses a language model to understand the user’s language instructions, and then generates high-quality images based on this instruction. Different from the language generation model, the language model used here mainly uses language to understand user input without generating language output, so the number of parameters can be much smaller (on the order of hundreds of millions), while the number of parameters of the image diffusion model is relatively Generally speaking, the amount of parameters is on the order of billions, but the amount of calculation is not small, because the resolution of the generated image or video can be very high.

The generative model can produce unprecedented high-quality output through massive data training. There are already many clear application markets, including search, dialogue robots, image generation and editing, etc., and more applications are expected in the future. This also puts forward a demand for related chips.

Generating Class Models for Chip Requirements

As mentioned earlier, the generation model represented by ChatGPT needs to learn from a large amount of training data in order to achieve high-quality generation output. In order to support high-efficiency training and reasoning, generative models also have their own requirements for related chips.

The first is the demand for distributed computing. Language generation models such as ChatGPT have hundreds of billions of parameters. It is almost impossible to use stand-alone training and reasoning, but must use a large number of distributed computing. When performing distributed computing, there is a great demand for data interconnection bandwidth between machines and computing chips for this type of distributed computing (such as RDMA), because many times the bottleneck of tasks may not be computing, but In terms of data interconnection, especially in such large-scale distributed computing, the high-efficiency support of chips for distributed computing has become the key.

Followed by memory capacity and bandwidth. Although distributed training and reasoning for language generation models are inevitable, the local memory and bandwidth of each chip will also largely determine the execution efficiency of a single chip (because the memory of each chip is used to the limit). For the image generation model, the model (about 20GB) can be placed in the memory of the chip, but with the further evolution of the image generation model in the future, its memory requirements may also be further increased. From this perspective, the ultra-high bandwidth memory technology represented by HBM will become an inevitable choice for related acceleration chips, and the generation of class models will also accelerate the further increase of capacity and bandwidth of HBM memory. In addition to HBM, new storage technologies such as CXL plus software optimization will also increase the capacity and performance of local storage in such applications, and it is estimated that more industrial adoption will be gained from the rise of generative models.

Finally, calculations. Whether it is language or image generation models, the computing requirements are very large. As the generation resolution of image generation models is getting higher and higher and moving towards video applications, the demand for computing power may be greatly increased— —The calculation amount of the current mainstream image generation model is about 20 TFlops, and with the trend towards high resolution and images, the computing power requirement of 100-1000 TFLOPS is likely to be the standard.

To sum up, we believe that the requirements of generative models for chips include distributed computing, storage and computing, which can be said to involve all aspects of chip design, and more importantly, how to combine these requirements into one in a reasonable way. Together to ensure that a single aspect does not become a bottleneck, this will also become a chip design system engineering problem.

GPU and new AI chip, who has more chance

Generative models have new requirements for chips. For GPUs (represented by Nvidia and AMD) and new AI chips (represented by Habana, GraphCore), who has a better chance of capturing this new demand and market?

First of all, from the perspective of the language class generation model, due to the huge amount of parameters, good distributed computing support is required, so GPU manufacturers with a complete layout in this type of ecology currently have more advantages. This is a systems engineering problem that requires a complete software and hardware solution, and in this regard, Nvidia has introduced a Triton solution in combination with its GPU. Triton supports distributed training and distributed inference. It can divide a model into multiple parts and process them on different GPUs, so as to solve the problem that the main memory of one GPU cannot accommodate too many parameters. In the future, whether it is to use Triton directly or to further develop on the basis of Triton, it will be more convenient to have a GPU with a complete ecology. From a calculation point of view, since the main calculation of the language generation model is matrix calculation, and matrix calculation itself is the strength of GPU, the advantage of the new AI chip over GPU is not obvious from this point of view.

From the perspective of image generation models, although the number of parameters of this type of model is also large, it is one to two orders of magnitude smaller than that of language generation models. In addition, convolution calculations are still used in a large number of calculations, so in reasoning applications , if very good optimization can be done, AI chips may have a certain chance. The optimization here includes a large amount of on-chip storage to accommodate parameters and intermediate calculation results, efficient support for convolution and matrix operations, etc.

Generally speaking, the current generation of AI chips is mainly designed for smaller models (the number of parameters is at the level of 100 million, and the amount of calculation is at the level of 1TOPS), and the demand for generating models is relatively higher than the original design goal. Much bigger. GPU is designed at the expense of efficiency in exchange for higher flexibility, while AI chip design is the opposite, pursuing the efficiency of target applications, so we believe that in the next year or two, GPU will still be here The generative model is the best in acceleration, but as the generative model design becomes more stable and the AI chip design has time to catch up with the iteration of the generative model, the AI chip has the opportunity to surpass the GPU in the field of generative models from the perspective of efficiency.

Prepare your supply chain

Buyers of electronic components must now be prepared for future prices, extended delivery time, and continuous challenge of the supply chain. Looking forward to the future, if the price and delivery time continues to increase, the procurement of JIT may become increasingly inevitable. On the contrary, buyers may need to adopt the “just in case” business model, holding excess inventory and finished products to prevent the long -term preparation period and the supply chain interruption.

As the shortage and the interruption of the supply chain continue, communication with customers and suppliers will be essential. Regular communication with suppliers will help buyers prepare for extension of delivery time, and always understand the changing market conditions at any time. Regular communication with customers will help customers manage the expectations of potential delays, rising prices and increased delivery time. This is essential to ease the impact of this news or at least ensure that customers will not be taken attention to the sudden changes in this chaotic market.

Most importantly, buyers of electronic components must take measures to expand and improve their supplier network. In this era, managing your supply chain requires every link to work as a cohesive unit. The distributor of the agent rather than a partner cannot withstand the storm of this market. Communication and transparency are essential for management and planning. In E-energy Holding Limited, we use the following ways to hedge these market conditions for customers:

Our supplier network has been reviewed and improved for more than ten years.
Our strategic location around the world enables us to access and review the company’s headquarters before making a purchase decision.
E-energy Holding Limited cooperates with a well -represented testing agency to conduct in -depth inspections and tests before delivering parts to our customers.
Our procurement is concentrated in franchise and manufacturer direct sales.
Our customer manager is committed to providing the highest level of services, communication and transparency. In addition to simply receiving orders, your customer manager will also help you develop solutions, planned inventory and delivery plans, maintain the inventory level of regular procurement, and ensure the authenticity of your parts.

Add E-energy Holding Limited to the list of suppliers approved by you, and let our team help you make strategic and wise procurement