NeuReality targets deep learning inference workloads on the edge, aiming to reduce CAPEX and OPEX for infrastructure owners
The AI chip space is booming, with innovation coming from a slew of startups in addition to the usual suspects. You may never have heard of NeuReality before, but it seems likely you’ll be hearing more about it after today.
NeuReality is a startup founded in Israel in 2019. Today it has announced NR1-P, which it dubs a novel AI-centric inference platform. That’s a bold claim for a previously unknown, and a very short time to arrive there — even if it is the first of more implementations to follow.
ZDNet connected with NeuReality CEO and co-founder Moshe Tanach to find out more.
Founded by industry veterans
Tanach has more than 20 years of experience in semiconductors and systems, having worked on solutions from compute and wireless all the way to data center networking and storage. He and his co-founders, VP Operations Tzvika Shmueli and VP VLSI Yossi Kasus, go a long way back, and have an impressive list of previous experience in key positions between them.
Habana Labs, Intel, Marvell and Mellanox are a few of the companies NeuReality’s founders have worked in. Xilinx is also a key partner for NeuReality, as Tanach explained. At this point, NR1-P is implemented as a prototype on Xilinx FPGAs. The goal is to implement NR1-P as a system-on-chip (SoC) eventually.
NeuReality has already started demonstrating NR1-P to customers and partners, although names were not disclosed. The company claims the prototype platform validates its technology and allows customers to integrate it in orchestrated data centers and other facilities.
Tanach distilled NeuReality’s philosophy by saying that systems and semiconductors should be designed from the outside to the inside: “You need to understand the system. If you can build the system, as Qualcomm is doing, they’re building a phone and a base station in order to make the best chips for phones”.
From the get-go, NeuReality made the choice to focus on inference workloads exclusively. As Tanach noted, dealing with how to train AI models has attracted lots of attention, and resulted in very expensive computer pods that have excellent results in training models.
But when you push AI to be used in real life applications, you need to care about how the model is deployed and used — hence, inference. And when you try to utilize an expensive pod, the cost of every AI operation stays very high and it’s hard to solve the two problems together.
This philosophy was also part of what brought Dr. Naveen Rao, former General Manager of Intel’s AI Products Group, to NeuReality’s Board of Directors. Rao was the founder of Nervana which was acquired by Intel in 2016. While working at Intel, Rao had two product lines, one for training and one for inference.
The pendulum of compute
Rao appreciates NeuReality’s “fresh view”, as Tanach put it. But what does that entail, exactly? NR1-P heavily leans on FPGA solutions, which is why the partnership with Xilinx is very important. Xilinx, Tanach noted, is not just programmable logic and FPGA:
“When you look at how their advanced FPGAs are built today, they are a system on a chip. They have ARM processors inside in their latest Versal ACAP technology. They also integrated an array of VLAW engines that you can program. And together with them we could build a 16 card server chassis that is very powerful”.
NeuReality implemented NR1-P in Xilinx FPGAs, so they didn’t have to fabricate anything — they just build the chassis. As Tanach noted, they worked with Xilinx and came up with an inference engine that is autonomous and is implemented inside the FPGA. A SoC is under development, and will be introduced in early 2022.
This means that NR1-P does not target embedded chips, as it would not be practical to use FPGAs for that. Even when the SoC is there, however, NeuReality will keep targeting near-edge solutions:
“Edge devices need even more optimized solutions, specially designed for the needs of the device. You need to do things in microwatts, milliwatts, or less than 50 milliwatts. But there’s the pendulum of compute. The current trend is to push more and more compute to the cloud. But we’re starting to see the pendulum coming back.
Look at Microsoft and AT&T deal to build many data centers across the US in AT&T facilities to bring more compute power closer to the edge. Many IoT devices will not be able to embed AI capabilities because of cost and power, so they will need a compute server to serve them closer to where they are. Going all the way to the cloud and back introduces high latency”.
An object-oriented hardware architecture
NeuReality’s “secret sauce” is conceptually simple, as per Tanach: other deep learning accelerators out there may do a very good job in offloading the neural net processing from the application, but they are PCI devices. They must be installed in the whole server, and that costs a lot.
The CPU is the center of the system and when it offloads things, it runs the driver of the device. That’s not the case for NeuReality. NR1-P is an autonomous device, connected to the network. It has all the data path functions, so they don’t need to run in software. This bottleneck is removed, eliminating the need for additional devices. Tanach referred to this as object oriented hardware:
“The main object here is the AI compute engine. We’ve been using object oriented software for a long time and it changed the way we code things. We wrap the main object with the functions that it needs. It’s time to develop hardware that does the same. If you want to invest in A.I. compute engines, make it the main thing”.
Another topic Tanach touched upon is the communication protocol used. Inference solutions like Nvidia use REST APIs, which makes for very expensive networking, he noted. NeuReality has other ways of doing it, which they’ll disclose in the future.
Last but not least, elasticity and utilization in cloud data centers is also important. Existing deep learning accelerators are out of this equation, Tanach said. Kubernetes connections, communication with the orchestrator, all this is done on the CPU hosting these deep learning accelerators. NeuReality integrates these functions into the device.
All that translates to very low cost for AI inference operation, both in terms of capital expense and operational expense, Tanach went on to add. At this time, FPGAs can be used in data centers, and places like 5G base stations where power is less of an issue. The SoC will come in two flavors, one for data centers and another one for lower cost and power specifications for nodes closer to the edge.
NeuReality claims a 15X improvement in performance per dollar compared to the available GPUs and ASICs offered by deep learning accelerator vendors. When asked about a reference for those claims, Tanach mentioned using MLPerf as the basis for internal benchmarking. NeuReality will share proposed updates to MLPerf soon, Tanach added.
Besides delivering its SoC, NeuReality is also working on delivering its software stack. The goal is to be able to work with whatever machine learning framework people are using, be it PyTorch or TensorFlow or anything else. Tanach noted that ONNX makes this easier, and NeuReality is investing in software.
The future of AI compute offload is to completely offload the pipeline, he went on to add. The promise is that NeuReality’s software stack will support a compute graph representation that will enable that. In terms of customers, NeuReality is targeting three segments.
Hyperscalers and next wave cloud service providers, solution providers that build data centers for clients such as the military, governments and the financial industry, and last but not least, OEMs.
Today’s announcement follows NeuReality’s emergence from stealth in February 2021, with $8M seed investment. Admittedly, it’s still early days for NeuReality. However, the company’s background and first signs make it seem worth keeping an eye on.