Master's Thesis

Mobile Application for Detection and Identification of Visually Similar Objects in Images

Final Thesis 13.24 MB

Author of thesis: Ing. Lenka Šoková

Acad. year: 2025/2026

Supervisor: Ing. Tomáš Goldmann, Ph.D.

Reviewer: Ing. Filip Orság, Ph.D.

Abstract:

This thesis presents a mobile Android application for detecting and identifying visually similar objects using reference images. The proposed system allows users to store reference images along with object labels and descriptions, and subsequently identify similar objects in a camera stream. The application primarily focuses on plant identification and comparison.

The thesis reviews object detection methods, lightweight neural network architectures, and low-shot and open-vocabulary detection approaches. Due to the computational complexity of existing methods, the proposed solution employs a lightweight hybrid pipeline based on embedding similarity comparison. The application supports multiple target selection approaches, including object detection, segmentation, and manual image cropping.

Several lightweight embedding backbones and loss functions are evaluated and fine-tuned on a plant-specific dataset. The resulting models are further optimized through quantization and pruning and deployed using suitable on-device inference frameworks with hardware acceleration support where available. Experimental results demonstrate that the proposed approach enables practical real-time object identification on mobile devices while balancing accuracy, latency, and model size.

Keywords:

object detection, few-shot detection, open-vocabulary detection, Android, on-device inference, quantization, embedding similarity, computer vision

Date of defence

23.06.2026

Result of the defence

Defended (thesis was successfully defended)

znamkaBznamka

Grading

B

Process of defence

Studentka nejprve prezentovala výsledky, kterých dosáhla v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Studentka následně odpověděla na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studentky na položené otázky rozhodla práci hodnotit stupněm B.

Topics for thesis defence

  1. How would the proposed system need to be modified to support several visually different domains, such as plants, museum exhibits, and consumer products, at the same time?
  2. What would be the expected bottleneck if the gallery contained thousands of reference images instead of tens of images, and how could this be solved efficiently on a mobile device?
  3. Jaké je využití referenčního snímku?

Language of thesis

English

Faculty

Department

Study programme

Information Technology and Artificial Intelligence (MITAI)

Specialization

Machine Learning (NMAL)

Composition of Committee

doc. Ing. Vítězslav Beran, Ph.D. (předseda)
prof. Ing. Hynek Heřmanský, Dr. Eng. (místopředseda)
doc. Ing. Ondřej Lengál, Ph.D. (člen)
doc. Ing. František Zbořil, Ph.D. (člen)
doc. Ing. Michal Bidlo, Ph.D. (člen)
RNDr. Marek Rychlý, Ph.D. (člen)

Supervisor’s report
Ing. Tomáš Goldmann, Ph.D.

Overall, this is a high-quality master's thesis, both in terms of the technical report and the final application. The student approached the work consistently and systematically, conducted extensive experiments to determine the applicability and limitations of the proposed application, and achieved convincing results. I evaluate the student's approach as excellent (A).

Evaluation criteria Verbal classification
Informace k zadání

I consider the assignment to be of average difficulty. The goal was to develop an experimental application to verify the feasibility of recognizing selected objects, specifically flowers in the final application. While I regard the assignment itself as average in terms of complexity, I evaluate the resulting work as above average. The student conducted a comprehensive series of relevant experiments focused on object recognition on mobile devices, and the resulting application can serve as a practical foundation for a fully-fledged user-facing application.

Aktivita při dokončování

The thesis was completed well ahead of the deadline. I was given sufficient time to review both the final technical report and the application before submission. The majority of the supervisor's comments were taken into account and incorporated into the final version of the technical report.

Publikační činnost, ocenění

No publications or additional awards related to this thesis are known to me.

Práce s literaturou

The student independently gathered all necessary study materials and academic literature without requiring significant guidance from the supervisor. The literature review is well-structured and corresponds appropriately to the topic being addressed.

Aktivita během řešení, konzultace, komunikace

Throughout the development of the thesis, the student regularly attended consultations, always arriving well-prepared. She actively discussed key milestones with me and approached any issues that arose with initiative and thoughtfulness. Communication remained at a good level throughout the entire duration of the project.

Points proposed by supervisor: 99

Grade proposed by supervisor: A

Reviewer’s report
Ing. Filip Orság, Ph.D.

The diploma thesis addresses a demanding and relevant topic at the intersection of computer vision, mobile deployment, and applied machine learning. The student successfully designed and implemented a complete Android application and supported the implementation with extensive model training, optimization, and evaluation. The work is technically mature, well documented, and practically oriented. The main weaknesses are the domain limitation to plants and the relatively limited systematic usability evaluation. However, these limitations are clearly discussed and do not substantially reduce the overall quality of the work. I evaluate the thesis as excellent.

Evaluation criteria Verbal classification Points
Rozsah splnění požadavků zadání

Evaluation level: zadání splněno

Rozsah technické zprávy

Evaluation level: je v obvyklém rozmezí

Prezentační úroveň technické zprávy

The thesis is well structured and logically organized. The theoretical chapters introduce object detection, mobile deployment constraints, low-shot learning, open-vocabulary detection, and metric learning before moving to the proposed method and experiments. The report also benefits from diagrams and screenshots, especially the pipeline diagram and the appendix showing the application interface. Minor weaknesses are that some theoretical sections are rather broad and could be more tightly connected to the final implementation, and some parts of the evaluation would benefit from a more concise summary for the reader.

95
Formální úprava technické zprávy

The formal quality of the report is very good. The English is clear and generally professional. Figures, tables, equations, and references are used appropriately. The thesis is typographically consistent and readable. I did not notice any major formal problems.

100
Práce s literaturou

The work with literature is very good. The thesis uses a broad set of relevant sources, including object detection, transformer-based detectors, open-vocabulary detection, metric learning, lightweight neural networks, quantization, pruning, and mobile segmentation models. The bibliography contains recent and topic-relevant papers. The adopted ideas are sufficiently distinguished from the student’s own implementation and experiments.

100
Realizační výstup

The student implemented a native Android application in Kotlin using Jetpack Compose. The application supports reference image storage, gallery management, object selection, detection, optional segmentation, cropping, background removal, embedding computation, similarity comparison, and model selection. The architecture follows the MVVM pattern and uses Room, MediaStore, StateFlow, model wrappers, and asynchronous processing, which indicates a well-designed Android implementation. The experimental part is also extensive. The student evaluated multiple embedding backbones and loss functions, fine-tuned models on a plant-specific dataset, compared the selected lightweight model with ImageNet-pretrained and CLIP baselines, and analyzed optimization methods such as quantization and pruning. The mobile performance evaluation is particularly valuable.

95
Využitelnost výsledků

The results are practically usable as a prototype Android application for reference-image-based plant identification and comparison. The work also provides useful experimental insight into the trade-offs between accuracy, latency, model size, and mobile inference backends.

Náročnost zadání

Evaluation level: obtížnější zadání

The assignment is technically demanding, as it combines several challenging areas: object detection, low-shot and open-vocabulary recognition, metric learning, model optimization for mobile devices, and native Android application development. In addition to studying existing methods, the student was required to design and implement a complete mobile solution, adapt or train suitable models, and evaluate the system in real-world conditions.

Topics for thesis defence:
  1. What would be the expected bottleneck if the gallery contained thousands of reference images instead of tens of images, and how could this be solved efficiently on a mobile device?
  2. How would the proposed system need to be modified to support several visually different domains, such as plants, museum exhibits, and consumer products, at the same time?
Points proposed by reviewer: 95

Grade proposed by reviewer: A

Responsibility: Mgr. et Mgr. Hana Odstrčilová