CNN W3L05 : Bounding Box Predictions

Convolutional Neural Networks(CNN) Week 3 Lecture 5 : Bounding Box Predictions
4 Antworten auf „CNN W3L05 : Bounding Box Predictions“

  1. From the description of the YOLO algorithm in this video, am I correct in thinking that the training data supplied should be such that each of these grid cells has sufficiently many objects appearing in them? For example, suppose the top-left grid cell does not have any objects appearing in it, in any of our training images. Then the network will not learn to adequately detect objects there, right?

    This leads me to think that the different grid cells don't quite share knowledge among one another. That is, they all have to be trained individually to some extent, by providing training data with objects within them.

    Thoughts, anyone?

  2. Why, in sliding window approach, matching exact position of an object is a problem? If the stride is 1, then we cover each pixel of the image (let's say with a 14×14 box centered at each pixel of the image), so we cover all the possible locations in image and therefore we will match the exact position of an object (its center). The problem arises only when we use a bigger stride.

