Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation
This research project focuses on image super-resolution (SR) implementing convolutions, vision transformers with shifted windows, and neighbor interpolations to enhance the resolution of images in an upscale of four. These characteristics form part of three modules of the proposed SR architecture ba...
Gespeichert in:
| 1. Verfasser: | |
|---|---|
| Format: | bachelorThesis |
| Sprache: | eng |
| Veröffentlicht: |
2023
|
| Schlagworte: | |
| Online Zugang: | http://repositorio.yachaytech.edu.ec/handle/123456789/622 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1858461410950905856 |
|---|---|
| author | Pijal Toapanta, Washington Danilo |
| author_facet | Pijal Toapanta, Washington Danilo |
| author_role | author |
| collection | Repositorio Universidad Yachay Tech |
| dc.contributor.none.fl_str_mv | Morocho Cayamcela, Manuel Eugenio |
| dc.creator.none.fl_str_mv | Pijal Toapanta, Washington Danilo |
| dc.date.none.fl_str_mv | 2023-06-05T09:40:52Z 2023-06-05T09:40:52Z 2023-05 |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://repositorio.yachaytech.edu.ec/handle/123456789/622 |
| dc.language.none.fl_str_mv | eng |
| dc.publisher.none.fl_str_mv | Universidad de Investigación de Tecnología Experimental Yachay |
| dc.rights.none.fl_str_mv | info:eu-repo/semantics/openAccess |
| dc.source.none.fl_str_mv | reponame:Repositorio Universidad Yachay Tech instname:Universidad Yachay Tech instacron:Yachay |
| dc.subject.none.fl_str_mv | Visión artificial Transformadores de visión Resolución de imagen Computer vision Vision transformers Neighbor interpolation |
| dc.title.none.fl_str_mv | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation |
| dc.type.none.fl_str_mv | info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/bachelorThesis |
| description | This research project focuses on image super-resolution (SR) implementing convolutions, vision transformers with shifted windows, and neighbor interpolations to enhance the resolution of images in an upscale of four. These characteristics form part of three modules of the proposed SR architecture based on vision transformers (SwinIR-OH): shallow feature extraction consisting of convolution layers, deep feature extraction containing residual vision transformers with shifted windows blocks, and SR image reconstruction includes convolutions and neighbor interpolations. Recent years have witnessed remarkable progress in SR using deep learning techniques. However, the SR algorithms using deep learning techniques differ in the following significant aspects: different types of network architectures, loss functions, learning principles, and strategies. For that reason, to do more proper research on the effect of the convolutions in the SR transformer-based architecture, all the state-of-the-art SR models presented in this research were trained in the same computational environment. They were selected considering their available source code, the mean peak signal-to-noise ratio (PSNR), and the mean of structural similarity index measure (SSIM). All the SR models form part of five existing methods: neural graph networks, residual networks, attention-based networks, generative adversarial networks models, and vision transformers. On the other hand, the results during the model's training show that traditional SR image reconstruction quality metrics (IRQM), such as the PSNR and SSIM, correlate inaccurately with the human perception of image quality and make it challenging to study the performance of the SR models. These results open the possibility of considering alternatives such as visual information fidelity and the sparse correlation coefficient as potential IRQMs to measure the performance of SR models. Also, the results indicate that implementing sequences of convolutions into SR image reconstruction architecture based on vision transformers improves the performance during SR image reconstruction, recovering some minimal details such as the eyelashes of a portrait, details that, without the sequences of convolutions, are lost during the deep feature extraction module or SR reconstruction module. |
| eu_rights_str_mv | openAccess |
| format | bachelorThesis |
| id | Yachay_8f2e3cf6ff04be4e98a8a1301c77319e |
| instacron_str | Yachay |
| institution | Yachay |
| instname_str | Universidad Yachay Tech |
| language | eng |
| network_acronym_str | Yachay |
| network_name_str | Repositorio Universidad Yachay Tech |
| oai_identifier_str | oai:repositorio.yachaytech.edu.ec:123456789/622 |
| publishDate | 2023 |
| publisher.none.fl_str_mv | Universidad de Investigación de Tecnología Experimental Yachay |
| reponame_str | Repositorio Universidad Yachay Tech |
| repository.mail.fl_str_mv | . |
| repository.name.fl_str_mv | Repositorio Universidad Yachay Tech - Universidad Yachay Tech |
| repository_id_str | 10284 |
| spelling | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolationPijal Toapanta, Washington DaniloVisión artificialTransformadores de visiónResolución de imagenComputer visionVision transformersNeighbor interpolationThis research project focuses on image super-resolution (SR) implementing convolutions, vision transformers with shifted windows, and neighbor interpolations to enhance the resolution of images in an upscale of four. These characteristics form part of three modules of the proposed SR architecture based on vision transformers (SwinIR-OH): shallow feature extraction consisting of convolution layers, deep feature extraction containing residual vision transformers with shifted windows blocks, and SR image reconstruction includes convolutions and neighbor interpolations. Recent years have witnessed remarkable progress in SR using deep learning techniques. However, the SR algorithms using deep learning techniques differ in the following significant aspects: different types of network architectures, loss functions, learning principles, and strategies. For that reason, to do more proper research on the effect of the convolutions in the SR transformer-based architecture, all the state-of-the-art SR models presented in this research were trained in the same computational environment. They were selected considering their available source code, the mean peak signal-to-noise ratio (PSNR), and the mean of structural similarity index measure (SSIM). All the SR models form part of five existing methods: neural graph networks, residual networks, attention-based networks, generative adversarial networks models, and vision transformers. On the other hand, the results during the model's training show that traditional SR image reconstruction quality metrics (IRQM), such as the PSNR and SSIM, correlate inaccurately with the human perception of image quality and make it challenging to study the performance of the SR models. These results open the possibility of considering alternatives such as visual information fidelity and the sparse correlation coefficient as potential IRQMs to measure the performance of SR models. Also, the results indicate that implementing sequences of convolutions into SR image reconstruction architecture based on vision transformers improves the performance during SR image reconstruction, recovering some minimal details such as the eyelashes of a portrait, details that, without the sequences of convolutions, are lost during the deep feature extraction module or SR reconstruction module.Este proyecto de investigación se centra en la superresolución de imagen (SR) implementando convoluciones, transformadores de visión con ventanas desplazadas e interpolaciones proximales para mejorar la resolución de imágenes en una escala de cuatro.Estas implementaciones forman parte de tres módulos principales de la arquitectura SR propuesta (SwinIR-OH): extracción de características superficiales que consta de una capa de convolución de 3×3, extracción de características profundas que contiene transformadores de visión residual con bloques de ventanas desplazados y reconstrucción de imágenes SR que incluye convoluciones e interpolaciones vecinas. Los últimos años han sido testigos de un progreso notable en SR utilizando técnicas de aprendizaje profundo. Sin embargo, los algoritmos de SR que utilizan técnicas difieren en los siguientes aspectos significativos: diferentes tipos de arquitecturas de red, funciones de pérdida, principios de aprendizaje y estrategias. Por tal motivo, para realizar una investigación más adecuada sobre el efecto de las convoluciones en la arquitectura basada en transformadores SR, todos los modelos SR de última generación presentados en esta investigación se entrenaron en el mismo entorno computacional. Todos los modelos de SR forman parte de cinco métodos existentes: redes de gráficos neuronales, redes residuales, redes basadas en la atención, modelos generativos de redes antagónicas y transformadores de visión. Se considera el código fuente disponible y la media de la proporción máxima de señal a ruido (PSNR) con la media del índice de similitud estructural (SSIM) antes de ser entrenado en el mismo entorno computacional. Por otro lado, los resultados durante el entrenamiento del modelo muestran que las métricas de calidad de reconstrucción de imágenes (IRQM) de SR tradicionales, como PSNR y SSIM, se correlacionan de manera imprecisa con la percepción humana de la calidad de imagen y dificultan el estudio del rendimiento de un modelo de SR. Estos resultados abren la posibilidad de considerar alternativas como la fidelidad de la información visual y el coeficiente de correlación disperso como posibles IRQM para medir el desempeño de los modelos SR. Finalmente, los resultados indican que la implementación de secuencias de convoluciones en la arquitectura de reconstrucción de imágenes SR mejora el rendimiento durante la reconstrucción de imágenes SR, recuperando algunos detalles mínimos, como las pestañas de un retrato, detalles que, sin las secuencias de convoluciones, se pierden en los módulos de extracción profunda o el módulo de reconstrucción SR.Ingeniero/a en Tecnologías de la InformaciónUniversidad de Investigación de Tecnología Experimental YachayMorocho Cayamcela, Manuel Eugenio2023-06-05T09:40:52Z2023-06-05T09:40:52Z2023-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisapplication/pdfhttp://repositorio.yachaytech.edu.ec/handle/123456789/622enginfo:eu-repo/semantics/openAccessreponame:Repositorio Universidad Yachay Techinstname:Universidad Yachay Techinstacron:Yachay2025-07-08T17:56:06Zoai:repositorio.yachaytech.edu.ec:123456789/622Institucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oaiEcuador...opendoar:102842025-07-08T17:56:06falseInstitucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oai.Ecuador...opendoar:102842025-07-08T17:56:06Repositorio Universidad Yachay Tech - Universidad Yachay Techfalse |
| spellingShingle | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation Pijal Toapanta, Washington Danilo Visión artificial Transformadores de visión Resolución de imagen Computer vision Vision transformers Neighbor interpolation |
| status_str | publishedVersion |
| title | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation |
| title_full | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation |
| title_fullStr | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation |
| title_full_unstemmed | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation |
| title_short | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation |
| title_sort | Image super-resolution through convolutions, hierarchical vision transformer with shifted Windows, and neighbor interpolation |
| topic | Visión artificial Transformadores de visión Resolución de imagen Computer vision Vision transformers Neighbor interpolation |
| url | http://repositorio.yachaytech.edu.ec/handle/123456789/622 |