Description: Scratching Visual Transformer's Back with Uniform Attention
analysis (2827) transformer (363) attention (300)
Decide to inject uniform attention because
(1) uniform attention is the densest attention and is unstable in terms of gradient view
(2) but, humans can supply uniform attention easily