This release contains implicit GEMM algorithm performance updates and bug fixes. Various host side improvements for better find and tuning performance. Implementation of four additional generic tensor reduction operations (AVG, AMAX, NORM1, NORM2).įixed a bug where Batchnorm would give incorrect results when the product of image height and image width is not a factor of four. Various host side performance improvements have been added as well.Īdded a GPU reference kernel implementation for faster testing.Īdd TargetID support for new AMD GPU architectures. This release contains new reduction operations, Winograd algorithm performance improvements as well as bug fixes. Various bug fixes in 3x3 assembly kernels Various bug fixes for MIOpenGEMM on the OpenCL backend Updates for Target ID features in ROCm stack This release contains various bug fixes and performance improvements. Various other bug fixes and performance improvements Improved MIOpen build time by splitting large kernel header filesįixed an issue in reduction kernels for padded tensors Updated the performance data for new kernel versions This release includes support for Navi21 and various other bug fixes and performance improvements
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |