Exploring the Potential of the Segment Anything Model

rw-book-cover

Metadata

Segment Anything Model’s (SAM) potential impact as a new foundation model for computer vision is exciting, particularly in the geospatial imagery community. With SAM, users can drop a point on objects in satellite imagery and, in many cases, automatically generate great segmentation predictions. This model uses a two-stage prediction approach. First, an encoder converts an image to image features (embeddings). Then, a decoder + encoded prompt converts a user supplied input (point(s), bounding box, or text) into segmentation masks that delineate the boundary of an object or objects (Figure 1). In our initial explorations, SAM is more robust and has better domain adaptability than Imagenet or COCO pre-trained weights out of the box. What’s more, SAM can be adapted on the fly for particular detection problems through user prompting! With prompting, as a user supplies inputs that relate to objects of interest, the model adapts to get better at generating segments for the problem at hand. (View Highlight)