AI workloads require a significant amount of compute power for training.
These are workloads are typically met using high-performance servers and a cluster of GPUs connected together by high-speed interconnects.
Once a neural network has been trained, the weights file can pruned and optimized for running in inference mode.
An optimized weight file can be run on an embedded platforms or servers with GPUs that are optimized for inference workloads.