The supervised learning-based recommendation models, whose infrastructures are sufficient training samples with high quality, have been widely applied in many domains. In the era of big data with the explosive growth of data volume, training samples should be labelled timely and accurately to guarantee the excellent recommendation performance of supervised learning-based models. Machine annotation cannot complete the tasks of labelling training samples with high quality because of limited machine intelligence. Although expert annotation can achieve a high accuracy, it requires a long time as well as more resources. As a new way of human intelligence to participate in machine computing, crowdsourcing annotation makes up for shortages of machine annotation and expert annotation. Therefore, in this paper, we utilize crowdsourcing annotation to label training samples. First, a suitable crowdsourcing mechanism is designed to create crowdsourcing annotation-based tasks for training sample labelling, and then two entropy-based ground truth inference algorithms (i.e., HILED and HILI) are proposed to achieve quality improvement of noise labels provided by the crowd. In addition, the descending and random order manners in crowdsourcing annotation-based tasks are also explored. The experimental results demonstrate that crowdsourcing annotation significantly improves the performance of machine annotation. Among the ground truth inference algorithms, both HILED and HILI improve the performance of baselines; meanwhile, HILED performs better than HILI.
Haze obscures remote sensing images, making it difficult to extract valuable information. To address this problem, we propose a fine detail extraction network that aims to restore image details and improve image quality. Specifically, to capture fine details, we design multi-scale and multi-dimensional extraction blocks and then fuse them to optimize feature extraction. The multi-scale extraction block adopts multi-scale pixel attention and channel attention to extract and combine global and local information from the image. Meanwhile, the multi-dimensional extraction block uses depthwise separable convolutional layers to capture additional dimensional information. Additionally, we integrate an atmospheric scattering model unit into the network to enhance both the dehazing effectiveness and stability. Our experiments on the SateHaze1k and HRSD datasets demonstrate that the proposed method efficiently handles remote sensing images with varying levels of haze, successfully recovers fine details, and achieves superior results compared to existing state-of-the-art dehazing techniques.
Graph embedding transforms high-dimensional graphs into a lower-dimensional vector space while preserving their structural information and properties. Context-sensitive graph embedding, in particular, performs well in tasks such as link prediction and ranking recommendations. However, existing context-sensitive graph embeddings have limitations: they require additional information, depend on community algorithms to capture multiple contexts, or fail to capture sufficient structural information. In this paper, we propose a novel Graph Embedding with Similarity Metric Learning (GESML). The core of GESML is to learn the optimal graph structure using an attention-based symmetric similarity metric function and establish association relationships between nodes through top-k pooling. Its primary advantage lies in not requiring additional features or multiple contexts, only using the symmetric similarity metric function and pooling operations to encode sufficient topological information for each node. Experimental results on three datasets involving link prediction and node-clustering tasks demonstrate that GESML significantly improves learning for all challenging tasks relative to a state-of-the-art (SOTA) baseline.
The differential privacy histogram publishing method based on grouping cannot balance the grouping reconstruction error and Laplace noise error, resulting in insufficient histogram publishing accuracy. To address this problem, we propose a symmetric histogram publishing method DPHR (differential privacy histogram released). Firstly, the algorithm uses the exponential mechanism to sort the counting of the original histogram bucket globally to improve the grouping accuracy; secondly, we propose an optimal dynamic symmetric programming grouping algorithm based on the global minimum error, which uses the global minimum error as the error evaluation function based on the ordered histogram. This way, we can achieve a global grouping of the optimal error balance while balancing the reconstruction and Laplace errors. Experiments show that this method effectively reduces the cumulative error between the published histogram and the original histogram under long-range counting queries based on satisfying ε-differential privacy and improves the usability of the published histogram data.