ViT Image Classification with Attention Visualization

Upload an image to classify its content and visualize the attention map on the right.

By default, the attention from the [CLS] token, representing the model’s overall summary, is highlighted, showing which image regions contributed most to the classification.

Click on any region of the image on the right to explore how the model attends to that specific area.