When anything goes into a bubble, it's hard not to worry about when it's going to crash, and today's AI chips are in a recognized bubble.
From the Cambrian DianNao of ASPLOS'14 to the current TPUv3 of Google, it only took five years for AI chips to become hugely successful.Riding the fast track of AI force explosion and Shouting the end of Moore's law, Domain Specific Architecture seems to be the only way out.
But when countless giants and start-ups design identical AI chips, we need to answer the question: do we really need so many AI chips?
Software complexity
With the rapid development of AI chips, one of the unavoidable problems is the exponential increase in software complexity.Many companies have taken two years or less to build a chip, only to find that it takes longer to support a wide range of frameworks, keep up with advances in algorithms, and adapt to platforms from phones to data centers.When the deployment and mass production window is missed, even when a chip is made, it quickly falls behind.
Different from designing general architectures, designing special architectures such as AI chips needs to consider the design and optimization of software at the same time.Chip companies are often optimistic about the cost of software adaptation and optimization, looking to middleware and compilers to solve all problems.In fact, from Intel to Google to Nvidia, a large number of software engineers are investing in adapting to various platforms and manually optimizing network performance.And for startups, chips have been around for years with delays.
In essence, as we begin to explore the potential of chip architecture, it becomes more and more difficult to abstract the software layer, because it has to introduce the model or parameters of the underlying architecture into the upper abstraction.It is now common practice to do the middleware between the underlying chip architecture and the upper software, but the cost of developing such middleware is often underestimated.Some time ago, a classmate of a chip start-up company consulted me to develop a set of Inference middleware similar to TensorRT. How much manpower and how long does it take?It wasn't an easy question to answer, so I asked them how many resources they had to do the project.
Surprisingly, his boss only gave three or four heads because they assumed that they already had a set of low-level compilers and a set of upper-level model transformation tools, so such a middleware for architectural abstraction did not require much effort.My guess is that this kind of investment can make a perfectly functional product, but I don't believe that the final product can meet the ideal performance target in the actual application, after all, the chip is not just used to run a Benchmark like resnet-50.
fragmentation
Software engineers have long wanted to write a single set of code to run on different platforms.The fragmentation of AI chips with different architectures will greatly discourage them from applying AI in real software products.Unlike previous experience, the poor interpretative nature of deep learning has many unexpected drawbacks.For example, there is a common frustration that a proprietary model can achieve satisfactory results on a local CPU, only to suffer a significant performance degradation upon deployment to a particular device.How to debug these problems, who is responsible for debugging, what tools are available for debugging, and even if the debugging engineer has access to private models?These are difficult questions to answer.
Fragmentation also occurs when proprietary architectures tend to abandon forward compatibility in order to exploit absolute performance.For example, the middleware mentioned above has a fragmented AI software framework at one end and a generation of chip architectures at the other end.How to maintain multiple partially incompatible instruction set architectures and ensure that each software update fully covers all devices?There is no alternative but to invest more manpower.A common argument is that, like today's consumer chips, it will only have a short-term (2-3 year) software support, whereas in common applications of AI chips, such as smart cameras, industrial intelligence, and autonomous driving, the life cycle of a chip can be as long as 10 years.It's hard to imagine how big a company needs to be to provide lasting technical support. If a startup is expected to die in two or three years, how can it safely deploy its products to a consumer-oriented production car?
AI chips are a transitional product
As a software engineer, I personally believe that custom AI processors are only a transitional product.A unified, programmable, highly concurrent architecture is what we should be looking for.Looking back over the last two decades, we've seen the market for dedicated minicomputers shrink, the development of graphics processors to general-purpose vector processors, and even the convergence of our mobile and computer platforms.There are reasons to believe that putting resources into customised AI chips is not a good investment.