Bag of words
Parent: SLAM Index
Source: http://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Has its origins in natural language processing (NLP), information retrieval
- A text can be seen as a bag of words, with each word having different frequencies from one another
- This can be used to compare and classify texts (similar histograms)
In vision
- Instead of words we have features (identifying pattern in an image)
- An image is represented as a set of features
- Features consist of
- Keypoints: points that are invariant to transformation
- Descriptors : description of the keypoint, for feature representation
- Construct a frequency histogram of features in the image
Workflow
Feature detection/extraction –> build vocabulary/codewords –> make histogram = BoW
Feature extractor algorithms
- Feature detection –> desriptor extraction
- e.g. SIFT, SURF, ORB etc. are algorithms for feature identification/description
Vocabulary building
- Clusters are made from the descriptors
- Clustering algorithms, e.g. k-means, DBSCAN, etc.
- Vocabulary (codewords) consists of the centres of each cluster “i.e. 1 vocab word is summarised from a group of descriptors”
(descriptor space)