Each computed mask is simply resized to the corresponding computed bounding-box. For example, using OpenCV:

mask = cv2.resize(mask, (bboxW, bboxH), interpolation=cv2.INTER_NEAREST)

Then, after converting it to a binary mask by thresholding it, you can overlay this mask on the input image using the coordinates of the corresponding computed bounding-box.

See here for more details.