Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One good approach could be to base the architecture on the TPU v1 from [1]. There are also open-source accelerators you could get inspiration from, for example [2][3]. If you want to do less work/not hand code the RTL yourself then you could look into methods for automatically mapping OpenCL to an FPGA accelerator architecture (or a service like [3] provides pre-designed architectures for multiple FPGAs).

[1] https://arxiv.org/abs/1704.04760

[2] https://github.com/jofrfu/tinyTPU

[3] https://github.com/tensil-ai/tensil



Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: