uniform-attention.github.io - Scratching Visual Transformer's Back with Uniform Attention

Description: Scratching Visual Transformer's Back with Uniform Attention

analysis (2827) transformer (363) attention (300)

Example domain paragraphs

Decide to inject uniform attention because

    (1) uniform attention is the densest attention and is unstable in terms of gradient view

    (2) but, humans can supply uniform attention easily

Links to uniform-attention.github.io (1)