Integración de Jetson Nano con Gemini Vision para la interpretación de tablas y gráficos estadísticos en documentos impresos

People with visual impairments face significant barriers to accessing information in printed documents, limiting their participation in education, employment, and social interaction. It can lead to exclusion in an increasingly information-dependent world. This motivated the development of a "Pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Torres Calva, Juan Pablo (author)
Format: bachelorThesis
Sprache:spa
Veröffentlicht: 2024
Schlagworte:
Online Zugang:https://dspace.unl.edu.ec/jspui/handle/123456789/30565
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:People with visual impairments face significant barriers to accessing information in printed documents, limiting their participation in education, employment, and social interaction. It can lead to exclusion in an increasingly information-dependent world. This motivated the development of a "Printed Document Reader for the Visually Impaired" to provide an accessible solution for reading printed documents for people with visual impairments, addressing the limited availability and high cost of current technologies such as OrCam MyEye glasses and mobile apps from Microsoft and Google. While these are useful, they present restrictions when describing complex scenarios. The project was based on integrating Google's Gemini multimodal language model into a Jetson Nano device, based on the SCRUM framework, and focused on identifying the needs of end users. Key tasks were developed from planning, hardware and software integration, and programming commands on a numeric keypad to execute specific tasks, such as reading text, reading tables, and describing statistical graphs, allowing the conversion of visual information to accessible formats such as audio. The main result was creating a functional system capable of providing accurate descriptions of printed documents through voice synthesis, improving accessibility to information for visually impaired people. Finally, it is concluded that the proposed solution is viable and efficient, standing out for its accessibility in terms of cost and advanced functionality compared to current market options, allowing greater inclusion and access to information for this group. Keywords: Jetson Nano, Gemini, Visual Impairment, Artificial Intelligence.