New speed: ~12.31-12.38 seconds on an A100 SXM4 (through Colab). Many, many thanks to @99991 (https://github.com/99991/cifar10-fast-simple) for their help with finding the issues that eventually led to some of these improvements, as well as detailed debugging and verification of results on their end. I encourage you to check out some of their work! <3 :)
Leave a comment