Feature Similarity Analysis Notebook¶
This notebook analyzes a labeled dataset of pipette and tube images using:
- VOC XML annotations for bounding boxes
- ResNet-50 & ConvNeXt-Base for feature extraction (comparison)
- Cosine similarity for comparing image regions
- Grad-CAM for visualizing model attention
1. Setup and Imports¶
from pathlib import Path
from features import create_feature_extractor, extract_all_features
from gradcam import (
create_gradcam,
plot_gradcam_grid,
plot_same_vs_cross_video_pairs,
select_representative_samples,
)
from parsing import get_class_distribution, get_video_distribution, load_dataset
from similarity import (
compute_cross_class_similarity,
compute_full_similarity,
compute_per_class_similarity,
get_class_video_sources,
)
from visualization import (
plot_class_comparison,
plot_cross_class_similarity,
plot_images_with_bboxes,
plot_per_class_similarity,
)
2. Load Dataset¶
Parse VOC XML annotations from the labeled dataset directory. Video boundary is set at frame 52 (frames 1-51 = clip_1, frames 52+ = clip_2).
DATASET_PATH = Path("pipette_angle_dataset_labelled")
VIDEO_BOUNDARY_FRAME = 52
# Load all annotations
all_objects = load_dataset(DATASET_PATH, VIDEO_BOUNDARY_FRAME)
print(f"Found {len(all_objects)} annotation objects")
# Show class distribution
class_counts = get_class_distribution(all_objects)
print("\nClass distribution:")
for cls, count in sorted(class_counts.items()):
print(f" {cls}: {count}")
# Show video source distribution per class
video_dist = get_video_distribution(all_objects)
print("\nVideo source distribution per class:")
for cls, counts in sorted(video_dist.items()):
print(f" {cls}: clip_1={counts['clip_1']}, clip_2={counts['clip_2']}")
Found 903 annotation objects Class distribution: pipette_no_tube: 47 pipette_tip_in_tube: 44 tube_no_pipette: 768 tube_with_pipette: 44 Video source distribution per class: pipette_no_tube: clip_1=23, clip_2=24 pipette_tip_in_tube: clip_1=28, clip_2=16 tube_no_pipette: clip_1=400, clip_2=368 tube_with_pipette: clip_1=28, clip_2=16
3. Visualize Sample Images with Bounding Boxes¶
Display a grid of sample images with color-coded bounding boxes:
- 🟢 Green:
tube_no_pipette - 🟠 Orange:
tube_with_pipette - 🔴 Red:
pipette_tip_in_tube - 🔵 Blue:
pipette_no_tube
# Group annotations by image
annotations_by_image: dict[str, list] = {}
for obj in all_objects:
img_path = obj["image_path"]
if img_path not in annotations_by_image:
annotations_by_image[img_path] = []
annotations_by_image[img_path].append(obj)
# Plot sample images
plot_images_with_bboxes(annotations_by_image, sample_step=10, max_images=9)
Showing 9 of 91 annotated images
4. Compare Similar Classes¶
Visual comparison of pipette_tip_in_tube vs tube_with_pipette to understand
the semantic difference between these overlapping concepts.
plot_class_comparison(
all_objects,
class1="pipette_tip_in_tube",
class2="tube_with_pipette",
n_samples=6,
padding=20,
)
pipette_tip_in_tube: 44 samples tube_with_pipette: 44 samples
5. Extract Features with ResNet-50¶
Use pretrained ResNet-50 to extract 2048-dimensional feature vectors from each labeled bounding box region.
# Create ResNet-50 feature extractor
resnet_extractor, device, resnet_name = create_feature_extractor("resnet50")
print(f"Using device: {device}")
print(f"Model: {resnet_name}")
# Extract features for all objects
resnet_features, all_labels, resnet_features_by_class = extract_all_features(
all_objects, resnet_extractor, device, verbose=True, model_name=resnet_name
)
Using device: mps Model: resnet50 Extracting features using resnet50... Processing 0/903... Processing 50/903... Processing 100/903... Processing 150/903... Processing 200/903... Processing 250/903... Processing 300/903... Processing 350/903... Processing 400/903... Processing 450/903... Processing 500/903... Processing 550/903... Processing 600/903... Processing 650/903... Processing 700/903... Processing 750/903... Processing 800/903... Processing 850/903... Processing 900/903... Extracted 903 features tube_with_pipette: 44 features pipette_tip_in_tube: 44 features tube_no_pipette: 768 features pipette_no_tube: 47 features
6. Extract Features with ConvNeXt-Base¶
Use pretrained ConvNeXt-Base to extract 1024-dimensional feature vectors for comparison with ResNet-50.
# Create ConvNeXt-Base feature extractor
convnext_extractor, device, convnext_name = create_feature_extractor("convnext_base")
print(f"Model: {convnext_name}")
# Extract features for all objects
convnext_features, _, convnext_features_by_class = extract_all_features(
all_objects, convnext_extractor, device, verbose=True, model_name=convnext_name
)
Model: convnext_base Extracting features using convnext_base... Processing 0/903... Processing 50/903... Processing 100/903... Processing 150/903... Processing 200/903... Processing 250/903... Processing 300/903... Processing 350/903... Processing 400/903... Processing 450/903... Processing 500/903... Processing 550/903... Processing 600/903... Processing 650/903... Processing 700/903... Processing 750/903... Processing 800/903... Processing 850/903... Processing 900/903... Extracted 903 features tube_with_pipette: 44 features pipette_tip_in_tube: 44 features tube_no_pipette: 768 features pipette_no_tube: 47 features
7. Compute Similarity Matrices¶
Calculate cosine similarity between all pairs of feature vectors for both models.
# Compute full similarity matrices
resnet_full_sim = compute_full_similarity(resnet_features)
convnext_full_sim = compute_full_similarity(convnext_features)
print(f"ResNet-50 similarity matrix shape: {resnet_full_sim.shape}")
print(f"ConvNeXt-Base similarity matrix shape: {convnext_full_sim.shape}")
# Compute per-class similarity matrices
resnet_class_sim = compute_per_class_similarity(resnet_features_by_class)
convnext_class_sim = compute_per_class_similarity(convnext_features_by_class)
ResNet-50 similarity matrix shape: (903, 903) ConvNeXt-Base similarity matrix shape: (903, 903)
8. Per-Class Similarity Matrices (ResNet-50)¶
Individual similarity matrices for each class using ResNet-50 features.
all_video_sources = [obj["video_source"] for obj in all_objects]
unique_classes = sorted(set(all_labels))
# Get video sources per class
class_video_sources = get_class_video_sources(
all_labels, all_video_sources, unique_classes
)
print("ResNet-50 Per-Class Similarity:")
plot_per_class_similarity(resnet_class_sim, class_video_sources)
ResNet-50 Per-Class Similarity:
9. Per-Class Similarity Matrices (ConvNeXt-Base)¶
Individual similarity matrices for each class using ConvNeXt-Base features.
print("ConvNeXt-Base Per-Class Similarity:")
plot_per_class_similarity(convnext_class_sim, class_video_sources)
ConvNeXt-Base Per-Class Similarity:
10. Cross-Class Similarity Comparison¶
Mean cosine similarity between each pair of classes for both models.
# Build class indices
class_indices: dict[str, list[int]] = {cls: [] for cls in unique_classes}
for i, label in enumerate(all_labels):
class_indices[label].append(i)
# Compute cross-class similarities
resnet_cross_sim = compute_cross_class_similarity(
resnet_full_sim, class_indices, unique_classes
)
convnext_cross_sim = compute_cross_class_similarity(
convnext_full_sim, class_indices, unique_classes
)
print("ResNet-50 Cross-Class Similarity:")
plot_cross_class_similarity(resnet_cross_sim, unique_classes)
print("\nConvNeXt-Base Cross-Class Similarity:")
plot_cross_class_similarity(convnext_cross_sim, unique_classes)
# Print comparison statistics
print("\n" + "=" * 60)
print("Similarity Statistics Comparison")
print("=" * 60)
for i, cls in enumerate(unique_classes):
resnet_intra = resnet_cross_sim[i, i]
convnext_intra = convnext_cross_sim[i, i]
print(f"\n{cls}:")
print(
f" Intra-class similarity: ResNet={resnet_intra:.3f}, ConvNeXt={convnext_intra:.3f}"
)
ResNet-50 Cross-Class Similarity:
ConvNeXt-Base Cross-Class Similarity:
============================================================ Similarity Statistics Comparison ============================================================ pipette_no_tube: Intra-class similarity: ResNet=0.477, ConvNeXt=0.490 pipette_tip_in_tube: Intra-class similarity: ResNet=0.539, ConvNeXt=0.609 tube_no_pipette: Intra-class similarity: ResNet=0.546, ConvNeXt=0.644 tube_with_pipette: Intra-class similarity: ResNet=0.618, ConvNeXt=0.719
11. Grad-CAM Visualization¶
Visualize where ResNet-50 focuses attention when processing each class. Representative samples are selected based on intra-class similarity:
- High similarity (typical examples)
- Median similarity
- Low similarity (outliers)
# Create Grad-CAM (uses ResNet-50)
gradcam, resnet_full = create_gradcam(device)
# Select representative samples based on ResNet features
representative_samples = select_representative_samples(
resnet_features, all_labels, unique_classes
)
print("Representative samples selected:")
for cls, indices in representative_samples.items():
print(f" {cls}: {len(indices)} samples")
Representative samples selected: pipette_no_tube: 3 samples pipette_tip_in_tube: 3 samples tube_no_pipette: 3 samples tube_with_pipette: 3 samples
# Plot Grad-CAM visualizations
plot_gradcam_grid(representative_samples, all_objects, gradcam, device)
12. Same-Video vs Cross-Video Pairs¶
For each class, compare pairs from the same video versus pairs from different videos. This demonstrates how video source affects similarity scores.
ResNet-50¶
print("ResNet-50:")
plot_same_vs_cross_video_pairs(resnet_features, all_labels, all_objects, unique_classes)
ResNet-50:
ConvNeXt-Base¶
print("ConvNeXt-Base:")
plot_same_vs_cross_video_pairs(
convnext_features, all_labels, all_objects, unique_classes
)
ConvNeXt-Base:
Next Steps¶
See Key Findings for analysis of these results and why the data is insufficient for reliable angle prediction.