Site icon T4Tutorials.com

Image Captioning — MCQs | Digital Image Processing

1. Which deep learning model is most commonly used for generating image captions?

(A) Convolutional Neural Network


(B) Recurrent Neural Network


(C) CNN-RNN hybrid model


(D) Support Vector Machine



2. Which dataset is widely used for training image captioning models?

(A) ImageNet


(B) COCO


(C) CIFAR-10


(D) PASCAL VOC



3. In image captioning, the CNN is mainly responsible for:

(A) Text generation


(B) Image classification


(C) Feature extraction from images


(D) Sentence ranking



4. Which neural network component is typically used after CNN in image captioning?

(A) Decision tree


(B) Recurrent Neural Network


(C) Autoencoder


(D) Transformer Encoder



5. The goal of image captioning is to generate:

(A) Object coordinates


(B) Semantic segmentation


(C) Descriptive sentences


(D) Class labels



6. Which architecture improves performance in image captioning by handling long-range dependencies?

(A) CNN


(B) RNN


(C) LSTM


(D) PCA



7. In attention-based models, attention mechanism helps the model:

(A) Filter noise


(B) Focus on relevant parts of the image


(C) Perform faster computation


(D) Resize the image



8. Which loss function is commonly used for training image captioning models?

(A) Mean Squared Error


(B) Cross Entropy Loss


(C) Hinge Loss


(D) Dice Loss



9. BLEU score is used in image captioning to measure:

(A) Image quality


(B) Model complexity


(C) Caption accuracy


(D) Segmentation overlap



10. The attention mechanism in image captioning was introduced in which model?

(A) Show and Tell


(B) Show, Attend and Tell


(C) Deep Caption


(D) Visual Genome



11. Image captioning typically combines which two types of data?

(A) Audio and video


(B) Text and metadata


(C) Visual and textual


(D) Numeric and symbolic



12. Which model architecture enables parallel training in caption generation?

(A) RNN


(B) LSTM


(C) Transformer


(D) GAN



13. Which metric evaluates n-gram overlap in caption generation?

(A) SSIM


(B) IoU


(C) BLEU


(D) PSNR



14. CIDEr metric in image captioning emphasizes:

(A) Text fluency


(B) Syntactic accuracy


(C) Consensus among human captions


(D) Image resolution



15. The encoder in an image captioning model processes:

(A) Captions


(B) Feature vectors


(C) Image input


(D) Evaluation metrics



16. The decoder in image captioning is responsible for:

(A) Extracting features


(B) Resizing images


(C) Generating sentences


(D) Compressing data



17. Which optimization algorithm is commonly used in training captioning models?

(A) Gradient Boosting


(B) Adam


(C) K-means


(D) Simulated Annealing



18. Which of the following is a common challenge in image captioning?

(A) Overfitting in training data


(B) Poor camera quality


(C) Low pixel density


(D) Absence of RGB values



19. In image captioning, what is “teacher forcing”?

(A) Manually labeling captions


(B) Feeding actual output during training


(C) Using only CNN layers


(D) Encoding data with noise



20. What does the term “vocabulary” refer to in image captioning?

(A) Set of image labels


(B) Number of input features


(C) Set of all words used in captions


(D) Collection of model parameters



21. What is beam search used for in caption generation?

(A) Training optimization


(B) Data augmentation


(C) Sequence prediction


(D) Evaluation metric



22. The term “Show and Tell” refers to:

(A) A captioning dataset


(B) A training tool


(C) A deep learning model


(D) A loss function



23. Which layer captures time-dependent patterns in sequence generation?

(A) Dense Layer


(B) Convolution Layer


(C) Recurrent Layer


(D) Normalization Layer



24. What is the main input to the decoder during testing in image captioning?

(A) True label


(B) Previous word prediction


(C) Random noise


(D) Entire image



25. Which of the following is a pre-trained model commonly used for feature extraction in image captioning?

(A) VGG16


(B) GPT-2


(C) YOLO


(D) UNet



26. What does the “context vector” in attention models represent?

(A) Evaluation result


(B) Caption score


(C) Weighted image features


(D) Learning rate



27. Why is dropout used in image captioning models?

(A) To reduce model size


(B) To improve image clarity


(C) To prevent overfitting


(D) To increase data throughput



28. Which method improves robustness of captioning models?

(A) Label smoothing


(B) Histogram equalization


(C) Pixel quantization


(D) Dilation



29. Which of the following is a captioning benchmark dataset?

(A) VOC 2007


(B) Open Images


(C) Flickr8k


(D) LFW



30. What role does a tokenizer play in image captioning?

(A) Enhances image edges


(B) Segments objects


(C) Converts sentences into word indices


(D) Compresses feature vectors



31. Which deep learning technique allows for generating varied captions for the same image?

(A) Deterministic decoding


(B) Greedy search


(C) Stochastic sampling


(D) Image resizing



32. What is “caption diversity” in image captioning?

(A) Image resolution variance


(B) Number of objects detected


(C) Variety of expressions for same content


(D) Object detection accuracy



33. Which transformer-based model is adapted for image captioning?

(A) BERT


(B) ResNet


(C) ViLT


(D) Vision Transformer (ViT)



34. What does the term “visual grounding” mean in captioning?

(A) Aligning image regions with textual phrases


(B) Training with GPU


(C) Reducing model size


(D) Labeling background



35. Which of the following can improve caption fluency?

(A) Increased dropout


(B) Sentence embedding


(C) Word repetition


(D) Random word shuffling



36. Which evaluation metric accounts for semantic similarity in captions?

(A) BLEU


(B) CIDEr


(C) METEOR


(D) SSIM



37. Which of the following does not belong to image captioning evaluation metrics?

(A) ROUGE


(B) BLEU


(C) CIDEr


(D) RMSE



38. In a captioning model, which layer is most likely used at the end of decoder?

(A) Softmax layer


(B) Max-pooling layer


(C) Dropout layer


(D) Convolutional layer



39. Which word usually marks the start of a generated caption sequence?

‘)” /> (A) [CLS]


” onclick=”checkAnswer(‘q39’, ‘‘)” /> (B)


” onclick=”checkAnswer(‘q39’, ‘‘)” /> (C)


” onclick=”checkAnswer(‘q39’, ‘‘)” /> (D)



40. What does “end-to-end training” mean in image captioning?

(A) Only training CNN part


(B) Only training RNN part


(C) Training entire model together


(D) Using pretrained decoder



41. Which of these is used for fine-tuning captions after generation?

(A) Caption synthesizer


(B) Post-processing heuristic


(C) Language model reranking


(D) Object detector



42. Which part of the image is mostly used in spatial attention?

(A) Image metadata


(B) Entire image as a single vector


(C) Region-specific features


(D) Histogram of intensities



43. Why are hierarchical models used in captioning?

(A) For filtering noise


(B) For faster computation


(C) To model sentence structures


(D) For compressing features



44. In self-critical sequence training (SCST), reward is computed using:

(A) Decoder weights


(B) CNN loss


(C) Evaluation metric like CIDEr


(D) Batch normalization



45. Which of the following helps model rare words in captions?

(A) Dropout


(B) Beam width


(C) Subword tokenization


(D) Feature normalization



46. Which method avoids repetition in captioning outputs?

(A) Greedy decoding


(B) N-gram blocking


(C) Convolutional pooling


(D) Object tracking



47. What is a major limitation of greedy decoding?

(A) High training time


(B) Requires labeled bounding boxes


(C) Misses better global sequences


(D) Increases vocabulary



48. How is the quality of generated captions usually assessed?

(A) Histogram matching


(B) Human evaluation and automated metrics


(C) Color quantization


(D) Model size



49. Which component in an image captioning model interprets visual data into a fixed-size representation?

(A) Decoder


(B) Tokenizer


(C) Encoder


(D) Softmax Layer



50. Which of the following best describes a key advantage of using Transformers in image captioning?

(A) Faster image rendering


(B) Better spatial resolution


(C) Parallel processing of sequences


(D) Reduced memory usage



More MCQs on Digital image Processing

  1. Introduction to DIP — MCQs | Digital Image Processing

  2. Human Visual System (HVS) — MCQs | Digital Image Processing

  3. Image Acquisition Devices — MCQs | Digital Image Processing

  4. Image Sampling & Quantization — MCQs | Digital Image Processing

  5. Image Resolution & Bit Depth — MCQs | Digital Image Processing

  6. Basic Image Operations (Negative, Log, Power-law) — MCQs | Digital Image Processing

  7. Histogram Equalization & Specification — MCQs | Digital Image Processing

  8. Contrast Stretching — MCQs | Digital Image Processing

  9. Image Arithmetic (Add, Subtract, Multiply, Divide) — MCQs | Digital Image Processing

  10. Bit-plane Slicing — MCQs | Digital Image Processing

  11. Smoothing Filters (Mean, Gaussian, Median) — MCQs | Digital Image Processing

  12. Sharpening Filters (Laplacian, Gradient) — MCQs | Digital Image Processing

  13. High-Boost Filtering — MCQs | Digital Image Processing

  14. Edge Detection (Sobel, Prewitt, Roberts, Canny, LoG) — MCQs | Digital Image Processing

  15. Fourier Transform (DFT, FFT) — MCQs | Digital Image Processing

  16. Frequency Domain Filtering — MCQs | Digital Image Processing

  17. Low-pass & High-pass Filters — MCQs | Digital Image Processing

  18. Homomorphic Filtering — MCQs | Digital Image Processing

  19. Noise Models (Gaussian, Salt & Pepper, Speckle) — MCQs | Digital Image Processing

  20. Adaptive Filtering — MCQs | Digital Image Processing

  21. Inverse & Wiener Filtering — MCQs | Digital Image Processing

  22. Pseudo-color & True-color Processing — MCQs | Digital Image Processing

  23. Color Space Conversion (RGB ↔ HSV, HSI, YCbCr) — MCQs | Digital Image Processing

  24. Color Image Enhancement — MCQs | Digital Image Processing

  25. Image Segmentation (Thresholding, Otsu, K-means, Region Growing) — MCQs | Digital Image Processing

  26. Edge-based Segmentation — MCQs | Digital Image Processing

  27. Region Splitting and Merging — MCQs | Digital Image Processing

  28. Watershed Algorithm — MCQs | Digital Image Processing

  29. Morphological Operations (Erosion, Dilation, Opening, Closing) — MCQs | Digital Image Processing

  30. Boundary Extraction — MCQs | Digital Image Processing

  31. Skeletonization — MCQs | Digital Image Processing

  32. Connected Components Labeling — MCQs | Digital Image Processing

  33. Texture Analysis (GLCM, LBP, Gabor Filters) — MCQs | Digital Image Processing

  34. Shape Descriptors (Perimeter, Area, Compactness, Eccentricity) — MCQs | Digital Image Processing

  35. Statistical Features (Mean, Variance, Skewness) — MCQs | Digital Image Processing

  36. Principal Component Analysis (PCA) — MCQs | Digital Image Processing

  37. Linear Discriminant Analysis (LDA) — MCQs | Digital Image Processing

  38. Feature Matching (SIFT, SURF, ORB) — MCQs | Digital Image Processing

  39. Image Registration — MCQs | Digital Image Processing

  40. Image Stitching — MCQs | Digital Image Processing

  41. Motion Detection & Optical Flow — MCQs | Digital Image Processing

  42. Background Subtraction — MCQs | Digital Image Processing

  43. Object Detection & Tracking — MCQs | Digital Image Processing

  44. Template Matching — MCQs | Digital Image Processing

  45. Pattern Recognition (KNN, SVM, ANN) — MCQs | Digital Image Processing

  46. Image Classification — MCQs | Digital Image Processing

  47. Image Clustering — MCQs | Digital Image Processing

  48. Image Compression (RLE, Huffman, LZW, JPEG, JPEG2000) — MCQs | Digital Image Processing

  49. Video Compression (MPEG, H.264) — MCQs | Digital Image Processing

  50. Image Fusion (Pixel, Feature, Decision Level) — MCQs | Digital Image Processing

  51. Image Watermarking — MCQs | Digital Image Processing

  52. Steganography — MCQs | Digital Image Processing

  53. Face Detection & Recognition — MCQs | Digital Image Processing

  54. Gesture Recognition — MCQs | Digital Image Processing

  55. 3D Image Processing — MCQs | Digital Image Processing

  56. Stereo Vision & Depth Estimation — MCQs | Digital Image Processing

  57. Medical Image Analysis (CT, MRI, Ultrasound) — MCQs | Digital Image Processing

  58. Remote Sensing Image Processing — MCQs | Digital Image Processing

  59. Satellite Image Enhancement — MCQs | Digital Image Processing

  60. Deep Learning for Image Processing (CNN, GANs, Autoencoders) — MCQs | Digital Image Processing

  61. Image Captioning — MCQs | Digital Image Processing

  62. Semantic & Instance Segmentation (Mask R-CNN, U-Net) — MCQs | Digital Image Processing

  63. Super Resolution (SRCNN, ESRGAN) — MCQs | Digital Image Processing

  64. Image Inpainting — MCQs | Digital Image Processing

  65. Image Style Transfer — MCQs | Digital Image Processing

  66. Real-Time Image Processing — MCQs | Digital Image Processing

  67. Augmented Reality (AR) & Virtual Reality (VR) — MCQs | Digital Image Processing

  68. DIP using MATLAB/OpenCV/Python — MCQs | Digital Image Processing

  69. DIP in IoT & Embedded Systems — MCQs | Digital Image Processing

  70. Ethics & Privacy in Image Processing — MCQs | Digital Image Processing

Computer Science Repeated MCQs Book Download

Exit mobile version