Computer control with 9 hand gestures using a standard webcam
Control de computadoras con 9 gestos de las manos usando una cámara web estándar
How to Cite
Download Citation
Show authors biography
Computer vision, which automates human visual capabilities, has become a promising field used in various industries, including implementations for human-computer interaction without the need to touch external control systems or peripherals. This paper presents a solution that recognizes hand gestures by analyzing three-dimensional landmarks at the joints. These points are obtained by using a webcam, and are used as input data for an artificial neural network that identifies nine different gestures. A network architecture was designed, a proprietary dataset was created and the network was trained. In addition, data pre-processing was implemented to normalize and transform the landmarks, thus improving the performance of the proposed model. The evaluation of the model showed a 99.87% hit rate in the recognition of the nine gestures. In this work, this model is implemented in an application called "Hand Controller", which allows controlling the keyboard and mouse of a computer by means of gestures and hand movements, achieving a high performance in the recognition of hand gestures in real time.
Article visits 32 | PDF visits
Downloads
- Chhibber, N., Surale, H. B., Matulic, F., & Vogel, D. (2021). Typealike: Near-keyboard hand postures for expanded laptop interaction. ACM on Human-Computer Interaction, 1-20.
- Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186.
- Hu, F., He, P., Xu, S., Li, Y., & Zhang, C. (2020). Fingertrak: Continuous 3d hand pose tracking by deep learning hand silhouettes captured by miniature thermal cameras on wrist. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4, 1-24.
- Hu, H., Zhao, W., Zhou, W., Wang, Y., & Li, H. (2021). Signbert: Pre-training of hand-model-aware representation for sign language recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11067–11076.
- Joo, Y. R., Shiratori, T., & Hanbyul. (2021). FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration. ICCV Workshop 2021.
- Kim, D. U., In Kim, K., & Baek, S. (2021). End-to-end detection and pose estimation of two interacting hands. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11169–11178.
- Kim, Y., An, S.-G., Lee, J., & Bae, S.-H. (2018). Agile 3d sketching with air scaffolding. Conference on Human Factors in Computing Systems CHI'18, 1-12.
- Lee, J. H., An, S., Kim, Y., & Bae, S. (2018). Projective windows: Bringing windows in space to the fingertip. Conference on Human Factors in Computing Systems CHI'2018, 1-8.
- Liao, J., & Wang, H. (2019). Gestures as intrinsic creativity support: Understanding the usage and function of hand gestures in computer-mediated group brainstorming. ACM on Human-Computer Interaction, 1-16.
- Liu, D., Zhang, L., & Wu, Y. (2022). Ld-congr: A large rgb-d video dataset for long-distance continuous gesture recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3294–3302.
- Malik, Z. C., Radosavovic, I., Kanazawa, A., & Jitendra. (2021). Reconstructing Hand-Object Interactions in the Wild. International Conference on Computer Vision (ICCV).
- Matulic, F., Arakawa, R., Vogel, B., & Vogel, D. (2020). Pensight: Enhanced interaction with a pen-top camera. Conference on Human Factors in Computing Systems CHI'20, 1-14.
- Min, Y., Chai, X., Zhao, L., & Chen, X. (2019). Flickernet: Adaptive 3d gesture recognition from sparse point clouds. The British Machine Vision Conference (BMVC).
- Min, Y., Zhang, Y., Chai, X., & Chen, X. (2020). An efficient pointlstm for point clouds based gesture recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5760–5769.
- Osimani, C., Piedra Fernández, J. A., & Ojeda Castello, J. J. (2023). Point Cloud Deep Learning Solution for Hand Gesture Recognition. International Journal of Interactive Multimedia and Artificial Intelligence. doi:http://dx.doi.org/10.9781/ijimai.2023.01.001
- Pei, S., Chen, A., Lee, J., & Zhang, Y. (2022). Hand interfaces: Using hands to imitate objects in ar/vr for expressive interactions. Conference on Human Factors in Computing Systems CHI ’22.
- Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. IEEE conference on computer vision and pattern recognition, 652–660.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
- Weng, Y., Yu, C., Shi, Y., Zhao, Y., Yan, Y., & Shi, Y. (2021). Facesight: Enabling hand-to-face gesture interaction on ar glasses with a downward-facing camera vision. CHI ’21 Conference on Human Factors in Computing Systems, 1-14.
- Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.-L., & Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. Fourth Workshop on Computer Vision for AR/VR (CV4ARVR).
- Zhou, Q., Sykes, S., Fels, S., & Kin, K. (2020). Gripmarks: Using hand grips to transform in-hand objects into mixed reality input. CHI ’20: Conference on Human Factors in Computing Systems, 1–11.