Haihang You

Chinese Academy of Sciences

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Chenggang Lai

University of Arkansas at Fayetteville

Xuan Shi

Xi’an University of Posts and Telecommunications

Chao Wang

Beijing Institute of Big Data Research

Weikang Zhang

Wenzhou Medical University

Miaoqing Huang

University of Arkansas at Fayetteville

Yixian Tang

Beijing Institute of Big Data Research

Hong Zhang

Aerospace Information Research Institute

Bo Zhang

Chinese Academy of Sciences

Liang Yu

University of Electronic Science and Technology of China

Wei Duan

China University of Mining and Technology

Cooperative Institutions

Chinese Academy of Sciences

University of Chinese Academy of Sciences

Aerospace Information Research Institute

Beijing Institute of Big Data Research

Southeast University

Hubei University of Chinese Medicine

University of Colorado Boulder

Wuhan University

Xidian University

Beijing University of Posts and Telecommunications

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

First mapping of China surface movement using supercomputing interferometric SAR technique

Science Bulletin (2021)

Chao Wang Yixian Tang Hong Zhang Haihang You Weikang Zhang

10.1016/j.scib.2021.04.026

Cite

Citations (24)

Geocomputation over the Emerging Heterogeneous Computing Infrastructure

Transactions in GIS (2014)

Xuan Shi Chenggang Lai Miaoqing Huang Haihang You

Abstract Emerging computer architectures and systems that combine multi‐core CPUs and accelerator technologies, like many‐core Graphic Processing Units ( GPUs ) and Intel's Many Integrated Core ( MIC ) coprocessors, would provide substantial computing power for many time‐consuming spatial‐temporal computation and applications. Although a distributed computing environment is suitable for large‐scale geospatial computation, emerging advanced computing infrastructure remains unexplored in GIS cience applications. This article introduces three categories of geospatial applications by effectively exploiting clusters of CPUs , GPUs and MICs for comparative analysis. Within these three benchmark tests, the GPU clusters exemplify advantages in the use case of embarrassingly parallelism. For spatial computation that has light communication between the computing nodes, GPU clusters present a similar performance to that of the MIC clusters when large data is applied. For applications that have intensive data communication between the computing nodes, MIC clusters could display better performance than GPU clusters. This conclusion will be beneficial to the future endeavors of the GIS cience community to deploy the emerging heterogeneous computing infrastructure efficiently to achieve high or better performance spatial computation over big data.

Symmetric multiprocessor system

Benchmark (surveying)

Coprocessor

Xeon Phi

Embarrassingly parallel

Multi-core processor

10.1111/tgis.12108

Cite

Citations (20)

Coupling climate and hydrological models: Interoperability through Web Services

Environmental Modelling & Software (2013)

Jonathan L. Goodall Kathleen D. Saint Mehmet B. Ercan Laura Briley Sylvia Murphy

Hydrological modelling

Forcing (mathematics)

10.1016/j.envsoft.2013.03.019

Cite

Citations (43)

Parallel Optimization for Large Scale Interferometric Synthetic Aperture Radar Data Processing

Remote Sensing (2023)

Weikang Zhang Haihang You Chao Wang Hong Zhang Yixian Tang

Interferometric synthetic aperture radar (InSAR) has developed rapidly over the past years and is considered as an important method for surface deformation monitoring, benefiting from growing data quantities and improving data quality. However, the handing of SAR big data poses significant challenges for related algorithms and pipeline, particularly in large-scale SAR data processing. In addition, InSAR algorithms are highly complex, and their task dependencies are intricate. There is a lack of efficient optimization models and task scheduling for InSAR pipeline. In this paper, we design parallel time-series InSAR processing models based on multi-thread technology for high efficiency in processing InSAR big data. These models concentrate on parallelizing critical algorithms that have high complexity, with a focus on deconstructing two computationally intensive algorithms through loop unrolling. Our parallel models have shown a significant improvement of 10–20 times in performance. We have also developed a parallel optimization tool, Simultaneous Task Automatic Runtime (STAR), which utilizes a data flow optimization strategy with thread pool technology to address the problem of low CPU utilization resulting from multiple modules and task dependencies in the InSAR processing pipeline. STAR provides a data-driven pipeline and enables concurrent execution of multiple tasks, with greater flexibility to keep the CPU busy and further improve CPU utilization through predetermined task flow. Additionally, a supercomputing-based system has been constructed for processing massive InSAR scientific big data and providing technical support for nationwide surface deformation measurement, in accordance with the framework of time series InSAR data processing. Using this system, we processed InSAR data with the volumes of 500 TB and 700 TB in 5 and 7 days, respectively. Finally we generated two maps of land surface deformation all over China.

Data Processing

10.3390/rs15071850

Cite

Citations (5)

A machine learning prediction model for Cardiac Amyloidosis using routine blood tests in patients with left ventricular hypertrophy

Scientific Reports (2024)

Yuling Pan Qingkun Fan Liang Yu Yunfan Liu Haihang You

Current approaches for cardiac amyloidosis (CA) identification are time-consuming, labor-intensive, and present challenges in sensitivity and accuracy, leading to limited treatment efficacy and poor prognosis for patients. In this retrospective study, we aimed to leverage machine learning (ML) to create a diagnostic model for CA using data from routine blood tests. Our dataset included 6,563 patients with left ventricular hypertrophy, 261 of whom had been diagnosed with CA. We divided the dataset into training and testing cohorts, applying ML algorithms such as logistic regression, random forest, and XGBoost for automated learning and prediction. Our model's diagnostic accuracy was then evaluated against CA biomarkers, specifically serum-free light chains (FLCs). The model's interpretability was elucidated by visualizing the feature importance through the gain map. XGBoost outperformed both random forest and logistic regression in internal validation on the testing cohort, achieving an area under the curve (AUC) of 0.95 (95%CI: 0.92–0.97), sensitivity of 0.92 (95%CI: 0.86–0.98), specificity of 0.95 (95%CI: 0.94–0.97), and an F1 score of 0.89 (95%CI: 0.85–0.92). Its performance was also superior to the serum FLC-kappa and FLC-lambda combination (AUC of 0.88). Furthermore, XGBoost identified unique biomarker signatures indicative of multisystem dysfunction in CA patients, with significant changes in eGFR, FT3, cTnI, ANC, and NT-proBNP. This study develops a highly sensitive and accurate ML model for CA detection using routine clinical laboratory data, effectively streamlining diagnostic procedures, and providing valuable clinical insights and guiding future research into disease mechanisms.

Interpretability

Leverage (statistics)

10.1038/s41598-024-77466-8

Cite

Citations (1)

Unsupervised image classification over supercomputers Kraken, Keeneland and Beacon

GIScience & Remote Sensing (2014)

Xuan Shi Miaoqing Huang Haihang You Chenggang Lai Zhong Chen

The iterative self-organizing data analysis technique algorithm (ISODATA) was implemented over supercomputers Kraken, Keeneland and Beacon to explore scalable and high-performance solutions for image processing and analytics using emerging advanced computer architectures. When 10 classes are extracted from one 18-GB image tile, the calculation can be reduced from several hours to no more than 90 seconds when 100 CPU, GPU or MIC processors are utilized. High-performance scalability tests were further implemented over Kraken using 10,800 processors to extract various number of classes from 12 image tiles totalling 216 gigabytes. As the first geospatial computations over GPU clusters (Keeneland) and MIC clusters (Beacon), the success of this research illustrates a solid foundation for exploring the potential of scalable and high-performance geospatial computation for the next generation cyber-enabled image analytics.

10.1080/15481603.2014.920229

Cite

Citations (12)