Skip to content

Approach Overview

Problem Understanding

Before attempting to build a prediction system, I needed to understand:

  1. Can we reliably detect pipettes and tubes?
  2. Are pretrained features useful for distinguishing tube and pipette angles?

My Analysis Pipeline

Step 1: Label the Dataset

I manually labeled the dataset with bounding boxes using LabelImg in Pascal VOC format. The object classes are:

Class Description
tube_no_pipette Test tubes without any pipette nearby
tube_with_pipette Tubes that have a pipette inserted
pipette_tip_in_tube The pipette tip region inside a tube
pipette_no_tube Pipettes not positioned over tubes

This resulted in ~900 labeled bounding boxes across 91 annotated frames.

Step 2: Extract Features

I used two pretrained CNN architectures to extract feature vectors from each labeled region:

  • ResNet-50 (ImageNet V2 weights) → 2048-dimensional features
  • ConvNeXt-Base (ImageNet V1 weights) → 1024-dimensional features

These models are trained on millions of images and should capture useful visual representations.

Step 3: Compute Similarity Matrices

For each model, I computed cosine similarity between all pairs of feature vectors. This reveals:

  • Intra-class similarity: How similar are features within the same object class?
  • Inter-class similarity: How similar are features between different classes?
  • Cross-video similarity: Does the video source affect feature similarity?

Step 4: Visualize with Grad-CAM

Grad-CAM shows where the model "looks" when processing an image. This helps understand what features the models are capturing.


What I Hoped to Find

If pretrained models were useful, I expected:

  1. High intra-class similarity — Same object types should have similar features
  2. Low inter-class similarity — Different object types should be distinguishable
  3. Consistent cross-video features — The same object class should look similar regardless of which video it came from

What I Actually Found

The reality was different. See Key Findings for the detailed analysis.


See the Code

The full implementation with all visualizations is available in the Analysis Notebook.