Abstract Emerging computer architectures and systems that combine multi‐core CPUs and accelerator technologies, like many‐core Graphic Processing Units ( GPUs ) and Intel's Many Integrated Core ( MIC ) coprocessors, would provide substantial computing power for many time‐consuming spatial‐temporal computation and applications. Although a distributed computing environment is suitable for large‐scale geospatial computation, emerging advanced computing infrastructure remains unexplored in GIS cience applications. This article introduces three categories of geospatial applications by effectively exploiting clusters of CPUs , GPUs and MICs for comparative analysis. Within these three benchmark tests, the GPU clusters exemplify advantages in the use case of embarrassingly parallelism. For spatial computation that has light communication between the computing nodes, GPU clusters present a similar performance to that of the MIC clusters when large data is applied. For applications that have intensive data communication between the computing nodes, MIC clusters could display better performance than GPU clusters. This conclusion will be beneficial to the future endeavors of the GIS cience community to deploy the emerging heterogeneous computing infrastructure efficiently to achieve high or better performance spatial computation over big data.
Interferometric synthetic aperture radar (InSAR) has developed rapidly over the past years and is considered as an important method for surface deformation monitoring, benefiting from growing data quantities and improving data quality. However, the handing of SAR big data poses significant challenges for related algorithms and pipeline, particularly in large-scale SAR data processing. In addition, InSAR algorithms are highly complex, and their task dependencies are intricate. There is a lack of efficient optimization models and task scheduling for InSAR pipeline. In this paper, we design parallel time-series InSAR processing models based on multi-thread technology for high efficiency in processing InSAR big data. These models concentrate on parallelizing critical algorithms that have high complexity, with a focus on deconstructing two computationally intensive algorithms through loop unrolling. Our parallel models have shown a significant improvement of 10–20 times in performance. We have also developed a parallel optimization tool, Simultaneous Task Automatic Runtime (STAR), which utilizes a data flow optimization strategy with thread pool technology to address the problem of low CPU utilization resulting from multiple modules and task dependencies in the InSAR processing pipeline. STAR provides a data-driven pipeline and enables concurrent execution of multiple tasks, with greater flexibility to keep the CPU busy and further improve CPU utilization through predetermined task flow. Additionally, a supercomputing-based system has been constructed for processing massive InSAR scientific big data and providing technical support for nationwide surface deformation measurement, in accordance with the framework of time series InSAR data processing. Using this system, we processed InSAR data with the volumes of 500 TB and 700 TB in 5 and 7 days, respectively. Finally we generated two maps of land surface deformation all over China.
Current approaches for cardiac amyloidosis (CA) identification are time-consuming, labor-intensive, and present challenges in sensitivity and accuracy, leading to limited treatment efficacy and poor prognosis for patients. In this retrospective study, we aimed to leverage machine learning (ML) to create a diagnostic model for CA using data from routine blood tests. Our dataset included 6,563 patients with left ventricular hypertrophy, 261 of whom had been diagnosed with CA. We divided the dataset into training and testing cohorts, applying ML algorithms such as logistic regression, random forest, and XGBoost for automated learning and prediction. Our model's diagnostic accuracy was then evaluated against CA biomarkers, specifically serum-free light chains (FLCs). The model's interpretability was elucidated by visualizing the feature importance through the gain map. XGBoost outperformed both random forest and logistic regression in internal validation on the testing cohort, achieving an area under the curve (AUC) of 0.95 (95%CI: 0.92–0.97), sensitivity of 0.92 (95%CI: 0.86–0.98), specificity of 0.95 (95%CI: 0.94–0.97), and an F1 score of 0.89 (95%CI: 0.85–0.92). Its performance was also superior to the serum FLC-kappa and FLC-lambda combination (AUC of 0.88). Furthermore, XGBoost identified unique biomarker signatures indicative of multisystem dysfunction in CA patients, with significant changes in eGFR, FT3, cTnI, ANC, and NT-proBNP. This study develops a highly sensitive and accurate ML model for CA detection using routine clinical laboratory data, effectively streamlining diagnostic procedures, and providing valuable clinical insights and guiding future research into disease mechanisms.
The iterative self-organizing data analysis technique algorithm (ISODATA) was implemented over supercomputers Kraken, Keeneland and Beacon to explore scalable and high-performance solutions for image processing and analytics using emerging advanced computer architectures. When 10 classes are extracted from one 18-GB image tile, the calculation can be reduced from several hours to no more than 90 seconds when 100 CPU, GPU or MIC processors are utilized. High-performance scalability tests were further implemented over Kraken using 10,800 processors to extract various number of classes from 12 image tiles totalling 216 gigabytes. As the first geospatial computations over GPU clusters (Keeneland) and MIC clusters (Beacon), the success of this research illustrates a solid foundation for exploring the potential of scalable and high-performance geospatial computation for the next generation cyber-enabled image analytics.