Developing a deep learning compiler stack that takes neural network descriptions (CNNs/RNNs) created in frameworks such as Caffe, PyTorch, TensorFlow, etc. and converts them into code suitable for execution on special-purpose and embedded platforms
Developing optimized implementations of a variety of neural-network operations and integrating them into a runtime framework
Developing new optimization techniques and algorithms to efficiently map CNNs onto a wide range of Tensilica Xtensa processors and specialized HW
Benchmarking end-to-end network performance on a variety of DSP and special-purpose accelerator platforms
Enhancing the framework to improve overall functionality and performance on the various hardware platforms
Devising multiprocessor/multicore partitioning and scheduling strategies
Developing complex programs to validate the functionality and performance of the CNN application programming kit
Working with hardware designers to identify opportunities for additional hardware acceleration of neural network functions
Working with industry-leading partners and customers to design and standardize neural network APIs

For more details, hit “Apply for job”