A Conversation with a Pioneer of Deep Learning at the Edge, Kurt Keutzer

Krishna Rangasayee

Recently I was honored to talk with Kurt Keutzer, professor of Electrical Engineering and Computer Science (EECS) at the University of California at Berkeley, and co-director of the Berkeley Deep Drive Industrial Consortium. Before joining Berkeley’s faculty in 1998, he had a 15-year career in industry, most recently as CTO and Senior Vice President at the electronic design automation (EDA) company Synopsys. He’s also been a successful entrepreneur, serving as an investor/advisor to over 30 startups.

Kurt’s work has covered a lot of ground, but these days he’s most widely recognized for making deep learning very efficient, particularly for edge/mobile applications.

SiMa.ai welcomed Kurt to peel back his own layers – we were intrigued to hear more about his impressive career and unique perspective on ML dynamics at the edge. We were also surprised (and delighted) to hear about Kurt’s passion for meditation and the traditional cultures of China, Indian and Tibet.

Read on for an in-depth conversation with one of the deep learning industry’s most talented minds.

Krishna: Very few people have been successful in academics, as a public company CTO, and as entrepreneur. What is your best advice for success for all of us?

Kurt: I’ve spent a lot of time reflecting on the key elements of success, and I’m not embarrassed to say that I read “Think and Grow Rich” when I was young, and I was very influenced by it. I’m big on positive thinking and visualizing what you want to accomplish. However, I think the other key element is Jung’s insight that, although we think of our psyches or our minds as like a monolith, like a single “Wizard of Oz” figure in a mission control center, actually our psyches consist of a whole committee.

So, I think visualizing what you want is one thing, but getting the psychological integration needed to get the whole committee together behind a vision is another. We need to ask ourselves: are we all on board? Do we have any saboteurs? Do we have dissenters? Is anybody blackballing this vision? It has taken me years of sitting in meditation to get my committee together. So, if I were to distill all that down I would say: get a vision of what you want to accomplish and then get the whole committee in your psyche behind it.

Krishna: You’ve been part of the industry since before ML became cool, and you really pioneered many paths. What’s your perspective today on the ML dynamics in the cloud and ML dynamics in the edge?

Kurt: Recently I was surprised when my colleague at Berkeley, Pieter Abbeel, said “most of the computation in the future is going to go on in the cloud.” I don’t think it’s that simple. Is computation in the cloud going to continue to grow? Of course. Is computation on mobile devices at the edge going to continue to grow? Of course. The pertinent question to me was where I wanted to focus on in my own research. As a nerd, it was clear to me that I could distinguish myself more at the edge than I could in the cloud. At the edge you can design systems that are much more tailored to the problem, and when you tailor things you can do more comprehensive optimizations. You can get much closer to the silicon than when you’re forced to think about supporting a diverse workload in the cloud. Ironically, you can then turn around and apply all you’ve learned about efficiency at the edge to the improve efficiency of applications in the cloud!

In terms of business opportunities, I wrote a little one-page position paper in Communications of the ACM saying that “if I could only design one chip, it would be an inference chip at the edge.” I think that was back in 2016. I’ve felt for a long time that there are much more diverse opportunities are really at the edge.

Krishna: There’s a lot of talk around training at the edge. Do you really think, pragmatically, in the next two to three years, inference devices will be doing some localized training or is it still more of a research area?

Kurt: Imagine you want to build an app that tailors bedtime stories to the user’s own voice. Users give the app permission to listen to them speaking on their phone, and then trains a kind of style transformer for the user’s voice. That app would enable a parent to take a standard bedtime story—”Goldilocks and the Three Bears,” and plays it in their own voice for their kids. In this case, unless you’re happy with this app putting a bunch of your conversations in the cloud, you’re going to want to train that neural net locally.

I do think we will see more training at the edge, but I think one key thing that people miss is we don’t have to do it fast. You take the amount of data that we might gather and leave that to train on your phone for a few nights. Even if it’s having to do a lot of conversion or a lot of work on the CPU, it’s still going to be able to train locally.

Krishna: Do you see transformers at the edge becoming the norm in the next two to three years?

Kurt: That’s a really good one. First, it’s fascinating that today we’re seen so many problems solved with a single algorithmic paradigm: the neural net. Different types of neural nets – convolutional neural nets, recurrent neural nets, LSTMs, but still, a single paradigm. One might think that we’re really at an inflection point and we’ll see other algorithmic techniques give superior results again. For example, in the past, simulated annealing had a heyday when it was solving so many diverse problems, and then other optimization approaches caught up and surpassed it. So, we might think the same thing will happen with neural nets.

Do we see any sign of that happening? Absolutely not! In fact, we see the opposite. Now, a single family of neural nets, transformers, is providing the superior solution across applications in vision, audio, and natural language. That’s amazing!

However, I don’t think it’s a philosophical, or even a purely technical, issue. I think there’s a kind of sociological component: there are literally thousands of researchers who think that transformers are the way to go, and, so, there are thousands working on making transformers efficient, and making transformers work for different problems. So, it’s no surprise we are seeing transformers everywhere. I think that if you’re building a chip for the future you’ve got to think about how it is going to run transformers.

Krishna: That’s very insightful. At SiMa, we are obviously focused at the edge, computer vision being a priority. Where do you see more maturity in adoption?

Kurt: Autonomous vehicles, including advanced driver assistance systems in passenger cars – drones, surveillance, and anything that’s mobile. I think it will be a while before we see mobile robots. I think another emerging market will be augmented reality and virtual reality – AR/VR. Some think that with 5G, you can do everything up in the cloud, but there are lots of reasons why that’s not going to work; there are concerns around availability, reliability, latency, power, as well as privacy.

Krishna: I don’t know a single company that doesn’t have a raging debate on FP16 or FP8, or no FP8 [floating point formats based on the number of bits occupied]. And I know your sentiment on this, but it’d be great to get your thought process on what the world really needs, and why are people having debates every day on what format to pick?

Kurt: I think what you need to do is to separate out the customer issues from the purely technical. If you send us a neural net and say, “You’re just not going to be able to keep up the accuracy in int8.” We’ll take that as a challenge and I’ll say, “I’ll put money on the table: we’ll be able to do it int8.” There are companies, successful large companies, for which the debate is int4 verses int8, not fp8 versus int8. On the other hand, from the customer perspective, you may need a bunch of the best people in the world on this topic to get it down to int4 or even int8. In other words, there’s what’s technically possible and there’s what you can expect your customers to do. Those are two different things.

Krishna: I was going to make the same point. In the world of embedded, customers want the ML experience without the learning curve. People are saying, “Hey, I have this neural net running on NVIDIA and it’s in FP16. I am not going to spend the effort to really make any modifications, so good luck to you.” I think our world really seems to be still quite behind what’s technically feasible.

Kurt: I get that, and I’ll trust your experience, but I would point out that there is an implicit customer engagement assumption there that, frankly, I would question. My first investment was in a company that did processors with instruction set extensions. At our first technical advisory board meeting the startup shared its strategy of enabling customer success through focusing on the ease-of-use of their tool set. It makes sense, right? If you enable lots of customers to do their own customization it seems like focusing on a good tool set is going to get you more design wins than if you focus on full solutions for a few verticals.

However, my perspective was: don’t you want to concentrate on providing full solutions in some verticals to show your customers how it’s done, and then give them easy-to-use toolsets later? But startups are resource constrained, and management is all about resource allocation. So, they focused on making what they thought would be an easy-to-use tool set rather than delivering more complete solutions for a. few verticals. Well, they kind of puttered along for like 14 years and then, finally what made them successful is delivering full solution kits to their customers for a few verticals. Sigh, either you laugh, or you cry, right?

Krishna: You’ve studied ML from the context of so many different end points, particularly on the edge. How do you see the software/hardware partitioning problem from a peer perspective?

Kurt: One fantastic thing about Berkeley is we have this continuous stream of companies and researchers who come in seeking advice, and I always learn more than I tell. Just yesterday, another neural net accelerator company came in, and their software team is twice the size of their hardware team. And I think probably they’re never going to regret that decision to invest in software. In other words, software is the key to success even in neural-net accelerator startups.

Krishna: So, moving a little more to SiMa, you have a fantastic vantage point. What do you see as our strengths, and what do you see our challenges as it comes to a competitive landscape of us versus NVIDIA, or Qualcomm or maybe ML startups?

Kurt: First, I have to say, honestly, that I feel that the number one advantage that SiMa has is you, Krishna, as CEO. There’s nothing more important in a startup than the CEO. Also, there’s the culture you’ve built. I really admire how open you are to direct talk and questioning assumptions.

Sorry for another trip down memory lane, but after Synopsys won the synthesis market, I went back to talk to all the folks I could still find who had once been our competitors in that market. I asked them how they perceived that era and why Synopsys was successful. I was amazed when most said their synthesis product was fantastically successful in terms of the metrics that they set for themselves, but maybe Synopsys had better marketing or something. Only one old-timer gave a realistic assessment of their company’s performance and admitted, “We just blew it.”

I think that the attitude I see here that expresses: “Let’s be honest with ourselves, what’s the truth of the situation here? Where do we really stand? What do we have to do? What do we have to do to win the business?” I admire that so much. I feel that with you at the front and with the technical team you are putting together behind you, if you all wake up every day of the year holding yourselves accountable in that honest way, you’re sure to be successful.

Krishna: Outside of your professional life, can you share some of your hobbies and interests, and what else do you do outside of everything you’re already doing?

Kurt: As you may notice in my CV, I did my undergraduate at Maharishi International University. I went there as an 18-year-old, when I’d already been practicing meditation for two years. Our culture is very externally oriented, and it focuses on how we manipulate the external world. Insights on manipulating the internal world? Not so much. I’m quite genuinely interested in the wisdom of the Indian, Tibetan, and Chinese cultures, each which have over 1,000-year traditions of really exploring inside. I’ve learned Tibetan reasonably well. I’ve studied just a bit of Sanskrit, and I’m learning a bit of Chinese now. I translate Tibetan texts, and I try and spend a fair amount of time going inside.

Krishna: This has been awesome. I really appreciate it. I know this will all enrich us and make us better in many different ways.

Kurt: These are great questions. This has been energizing. I’m very happy to do this, so thank you.