For a neural network to run at its fastest, the underlying hardware must run efficiently on all layers. Through the inference of any CNN—whether it be based on an architecture such as YOLO, ResNet, or Inception—the workload regularly shifts from being bottlenecked by memory to being bottlenecked by compute resources. You can think of each convolutional layer as its own mini-workload, and so...
» read more