unet-paper

Search IconIcon to open search

Source: U-Net paper

Contributions

  • Architecture + training strategy using data augmentation
  • Best performing network on ISBI challenge for segmentation of structures in microscopy images, cell tracking in microscopy images

History

  • Up till now, CNN mainly for image classification: image –> single label
  • Problems in biomedical image processing
    • localisation also necessary: which class label belongs to which pixel?
    • sparse dataset (not enough annotated data)
  • Ciresan: network in sliding window setup to predict pixel class label
    • provides patch around pixel as input
    • localisation is successful
    • training data in patches > training data as images themselves (1 image –> many patches)
    • drawbacks:
      • slow, NW run separately for each patch
      • redundancy: overlapping patches
      • trade-off between localisation accuracy and use of context
        • better localisation: small patches, but NW sees little context (bad for classification)
        • better context: large patches, but max pooling layers reduce localisation accuracy

Idea

Want both: localisation accuracy, and classification accuracy (use of context)

  • For context: use features from multiple layers –> skip connections
  • Pooling is replaced by upsampling –> increase resolution of the output
    • We want learnable upsampling
    • In U-Net: reason for this is because they want to propagate context from lower res to higher res layers
  • For localisation: high rest features from contracting path are combined with upsampled inputs
  • Tiling strategy: for dealing with large images that won’t fit on the GPU
    • “Output segmentation map only contains the pixels, full context for the segmap is available in the input image.”
    • Prediction of the segmentation outputs a smaller image (yellow) than given as input (blue) – “pred of yellow area requires blue are a as input”
  • (also instance segmentation, separation of touching cells)

Architecture

Questions

  • What does use of context mean?
    Use of surrounding ‘patch’ of image

Notes

https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47

  • Regular CNN: we get class ‘what?’ but lose the location information ‘where’ – pure classification
  • What is wanted: semantic segmentation: what + where/which pixels