Patchdrivenet |best| Here

is a novel neural network architecture designed for real-time driving scene perception. It leverages a patch-based tokenization strategy to efficiently process high-resolution road images. Unlike traditional CNNs or Vision Transformers that operate on full frames or regular grids, PatchDriveNet extracts semantically meaningful patches (e.g., vehicles, lane markings, traffic signs) using a learnable patch selection module. This enables adaptive computation and improved performance on edge devices.

Pro-tip: Start with a pre-trained global backbone and freeze it for the first 10 epochs, training only the saliency head with a binary mask loss (where the mask comes from an oracle that knows where the objects are). patchdrivenet