Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation

This research project focuses on image super-resolution (SR) implementing convolutions, vision transformers with shifted windows, and neighbor interpolations to enhance the resolution of images in an upscale of four. These characteristics form part of three modules of the proposed SR architecture ba...

Full description

Saved in:

Bibliographic Details
Main Author:	Pijal Toapanta, Washington Danilo (author)
Format:	bachelorThesis
Language:	eng
Published:	2023
Subjects:	Visión artificial Transformadores de visión Resolución de imagen Computer vision Vision transformers Neighbor interpolation
Online Access:	http://repositorio.yachaytech.edu.ec/handle/123456789/622
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This research project focuses on image super-resolution (SR) implementing convolutions, vision transformers with shifted windows, and neighbor interpolations to enhance the resolution of images in an upscale of four. These characteristics form part of three modules of the proposed SR architecture based on vision transformers (SwinIR-OH): shallow feature extraction consisting of convolution layers, deep feature extraction containing residual vision transformers with shifted windows blocks, and SR image reconstruction includes convolutions and neighbor interpolations. Recent years have witnessed remarkable progress in SR using deep learning techniques. However, the SR algorithms using deep learning techniques differ in the following significant aspects: different types of network architectures, loss functions, learning principles, and strategies. For that reason, to do more proper research on the effect of the convolutions in the SR transformer-based architecture, all the state-of-the-art SR models presented in this research were trained in the same computational environment. They were selected considering their available source code, the mean peak signal-to-noise ratio (PSNR), and the mean of structural similarity index measure (SSIM). All the SR models form part of five existing methods: neural graph networks, residual networks, attention-based networks, generative adversarial networks models, and vision transformers. On the other hand, the results during the model's training show that traditional SR image reconstruction quality metrics (IRQM), such as the PSNR and SSIM, correlate inaccurately with the human perception of image quality and make it challenging to study the performance of the SR models. These results open the possibility of considering alternatives such as visual information fidelity and the sparse correlation coefficient as potential IRQMs to measure the performance of SR models. Also, the results indicate that implementing sequences of convolutions into SR image reconstruction architecture based on vision transformers improves the performance during SR image reconstruction, recovering some minimal details such as the eyelashes of a portrait, details that, without the sequences of convolutions, are lost during the deep feature extraction module or SR reconstruction module.

Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation

Similar Items