YAXIN HU

Behavioral and gaze-data analysis with deep learning neural networks
Update Nov. 2024
I've joined the Pattern Recognition Company in Lübeck, Germany since January 1st, 2022. And I am enrolled as a PhD student at Lübeck University. Here I am supervised by Prof. Erhardt Barth.
Personal Background and Interest:
I have spent 22 years living in China, and I have got my Bachelor’s Degree in Electrical Engineering and Automation from School of Mechatronic Engineering and Automation, Shanghai University. Then I went to Singapore and got my Master’s Degree in Mechanical Engineering from National University of Singapore. Since then, my research interests have become computer vision and deep learning for medicine. I would like to analyze medical data and design medical toolboxes by using artificial intelligence.
Aim of the project:
The objectives of my project are to 1) perform a comprehensive evaluation of different deep-learning approaches for analyzing behavioral and gaze-data and, 2) develop a machine-learning toolbox that can be used to discover diagnostic or rehabilitation relevance in these data.
Current activities:
Video Understanding: The cutting-edge architectures we studied for video understanding can be applied to the healthcare domain, such as surgical robots for surgical videos segmentation and workflow recognition, thereby alerting surgeons of possible complications, reducing their operative mistakes and supporting decision making. Moreover, the video architectures can be used in MRI data for detecting tumors and analysing neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease.)
Novel Designs of Video Transformers for Action Recognition
With the development of deep learning, video understanding has become a promising and challenging research field. In recent years, different transformer architectures have shown state-of-the-art performance on most benchmarks. Although transformers can process longer temporal sequences and therefore perform better than convolution networks, they require huge datasets and have high computational costs. The inputs to video transformers are usually clips sampled out of a video, and the length of the clips is limited by the available computing resources. In this paper, we introduce novel methods to sample and tokenize the input video, such as to better capture the dynamics of the input without a large increase in computational costs. Moreover, we introduce the MinBlocks as a novel architecture inspired by neural processing in biological vision. The combination of variable tubes and MinBlocks improves network performance by 10.67%.
Salient Spatio Temporal Slices on 2D-CNNs for Video Understanding
Video understanding remains a challenge even with advanced deep-learning methods that typically sample a few frames from which spatial and temporal features are extracted. Such down-sampling often leads to the loss of critical temporal information. Moreover, current state-of-the-art methods involve high computational costs. 2D Convolutional Neural Networks (2D-CNNs) have proven to be effective at capturing spatial features of images, but cannot make use of temporal information. To address these challenges, we propose to use 2D-CNNs not only on images, i.e. xy-slices of the video, but on salient spatio-temporal xt and yt slices to efficiently capture both spatial and temporal information of the entire video. As 2D-CNNs are known to extract local spatial orientation in xy, they can now extract motion, which is a local orientation in xt and yt. We complement the approach with a simple strategy for sampling the most informative slices and show that we can outperform alternative approaches in a number of tasks, especially in cases in which the actions are defined by their dynamics, i.e., by spatio-temporal patterns.
Effective Use of Color and Temporal Information for Video Analysis
The modeling of temporal dependencies, and the associated computational load, remain challenges in video understanding. We here focus on using a more efficient sampling of color and temporal information. We sample color not from the same frame but from different consecutive frames to capture richer temporal information without increasing the computational load. We demonstrate the effectiveness of our approach for 2D-CNNs, 3D-CNNs, and Transformers, for which we obtain significant performance improvements on two benchmarks. The improvements are 2.43% on UCF101 and 4.55% on HMDB51 for ResNet18, 10.28% and 7.12% for the 3D-ResNet18, and 15.11% and 13.71% for the UniFormerV2. These improvements are obtained without additional costs by just changing the way color is sampled.
Collaboration with Medical Doctors from the University Hospital Schleswig-Holstein (UKSH): Brain Fractal Dimension and Machine Learning can predict first-episode psychosis and risk for transition to psychosis.
The project is about First-episode Psychosis analysis based on brain MRI, we have four different groups: First Episode Psychosis (FEP), Clinic High Risk with later Transition (CHR_T), Clinic High Risk without later Transition (CHR_NT) and Healthy Control (HC). We would like to find the difference among these groups and would like to know whether the transition of psychosis is predictable. We first extract statistical features from MRI data and fractal dimension data and then use a feature selector to select the most informative features. After that, we use different classifiers to make classifications. And we find some good results by using fractal dimension as identifying biomarkers in Psychosis.
Collaboration with Safa (ESR7): Virtual Reality-based Assessment of Locomotion and Navigation in Glaucoma.
The dataset consists of 14 glaucoma patients and 15 age-matched controls. The goal of the experiment was to evaluate the performance of two groups on a path integration (PI) task under different environmental conditions, specifically daytime and dawn, using immersive virtual reality (VR). In this task, participants walked along a path pointed by three checkpoints and subsequently indicated the starting location of the path. We obtain motion behaviour data along the path and extract hand-crafted features such as time, position, speed, acceleration, etc. based on these data. We then use 1D-CNN to process these features and an SVM classifier to make classification. Our machine learning model achieved 70% accuracy.
Collaboration with Kurt (ESR6) and IIT: A Machine Learning Approach to Unveil Balance Behavior Through Aging with an Auditory Cue
The dataset consists of 10 younger subjects, 9 middle-age subjects and 9 elderly subjects. We use the Vicon system to collect motion data from subjects undergoing Fukuda tests. Then we calculate features such as displacement in x axis, displacement in y axis, linear displacement, angular displacement and angle of rotation and use machine learning algorithm to analyze the features. We find that aging has impacts on the balance behavior of humans and audio cues are useful for self-motion perception.
Collaboration with Safa (ESR6): A novel deep learning approach to assess visual system integrity from movement patterns during treadmill walking – a pilot study. Glaucoma is a common eye disease, mostly found in the elderly, which may lead to irreversible blindness and bring a lot of inconvenience to patients' lives. Early detection and intervention of glaucoma can effectively prevent patients from irreversible vision loss. Glaucoma is one of the leading causes of visual impairment. Visual impairment may affect movement. Therefore, studying the relationship between visual impairment and movement has clinical value and also provides a potential method for the diagnosis of eye diseases. There are many clinical methods to diagnose glaucoma, but these methods usually require the full participation of clinicians, which is very cumbersome and time-consuming. Here we propose a novel convolutional neural network framework using existing visual tests videos to study whether glaucoma patients and visually impaired people have different motion patterns from healthy control, so that we can diagnose visual impairments by AI toolbox using videos without requiring excessive clinical resources. We achieve the highest top-1 accuracy of 85% for classifying visually impaired subjects and health control.
Future directions:
In the near future, I will continue my work on AI in healthcare. With AI increasingly transforming industries and driving the fourth industrial revolution, I am looking forward to leveraging my skills in AI to make a meaningful contribution to the development of intelligent healthcare platforms.
My OptiVisT experience:
As a member of the OptiVisT programme, I have had the privilege of collaborating with friendly, supportive colleagues and knowledgeable supervisors. Over the past three years, I have significantly enhanced my scientific research skills while also receiving extensive training for junior researchers. The workshops and conferences I attended during this programme have been both enjoyable and highly beneficial. Most importantly, I have cultivated essential qualities of a successful researcher and forged many lasting friendships through this programme.
Project output
I have attended the following conferences to present my work:
- The 22nd International Conference on Image Analysis and Processing (ICIAP 2023); Udine, Italy; 11-15 September 2023.
- The 19th edition of the IEEE International Symposium on Medical Measurements and Applications (MeMeA2024); Eindhoven, Netherlands; 26 - 28 June 2024.
- The International Joint Conference on Neural Networks (IJCNN 2024); Yokohama, Japan; 30 June - 5 July 2024.
- The 46th European Conference on Visual Perception (ECVP); Aberdeen, UK; 25 - 29 August 2024.
- The 33rd International Conference on Artificial Neural Networks (ICANN); Lugano, Switzerland; 17 - 20 September 2024.
- The 31st International Conference on Neural Information Processing (ICONIP); Auckland, New Zealand; 2 - 6 December 2024.
My current publications are:
- Hu, Y., and Barth, E. (2024, June). Novel Design Ideas that Improve Video Understanding Networks with Transformers. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE. DOI: 10.1109/IJCNN60899.2024.10649969
- Guarischi, M., Hu, Y., Kurt, A. B., Zanchi, S., Barth, E., and Gori, M. (2024, June). A Machine Learning Approach to Unveil Balance Behavior Through Aging with an Auditory Cue. In 2024 IEEE International Symposium on Medical Measurements and Applications (MeMeA) (pp. 1-6). IEEE. DOI: 10.1109/MeMeA60663.2024.10596824
- Hu, Y., and Barth, E. (2024, September). Video Understanding Using 2D-CNNs on Salient Spatio-Temporal Slices. In the International Conference on Artificial Neural Networks (pp. 256-270). Cham: Springer Nature Switzerland. DOI: 10.1007/978-3-031-72338-4_18
- Hu, Y., and Barth, E. (2024, December). How to Efficiently Use Color and Temporal Information for Video Understanding? (Accepted at ICONIP2024)
- H. Yaxin, F. Marina, A. Christina, et al., “Brain fractal dimension and machine learning can predict first-episode psychosis and risk for transition to psychosis,” Computers in Biology and Medicine, Under Review.
Contact
Interested in my work and want to get in touch? Send me an e-mail to yaxin.hu@student.uni-luebeck.de. Or contact me via LinkedIn https://www.linkedin.com/in/yaxin-hu-bb28b9211/