数据为王：深度部分标签学习中的候选标签集剪枝

近年来，深度学习技术在各个领域取得了巨大成功，但其有效训练通常需要大量完美标注的数据，这在实际应用中是一个巨大的挑战。为了在数据质量和标注成本之间取得平衡，部分标签学习（PLL）应运而生。PLL 允许为每个训练样本分配一个候选标签集，其中只有一个是真实标签。

传统的深度 PLL 研究主要从学习的角度出发，设计各种训练策略来解决标签歧义问题，例如识别候选标签集中隐藏的真实标签。然而，当候选标签集的大小过大时，这些学习策略将难以找到真实标签，导致模型性能下降。

本文提出了一种新的数据驱动方法，称为候选标签集剪枝（CLSP），旨在以无训练的方式过滤掉候选标签集中潜在的错误标签。

CLSP：数据中心的视角

CLSP 的核心思想是利用表示空间和候选标签空间之间的不一致性来识别错误标签。具体而言，对于一个训练样本的每个候选标签，如果它不是该样本在表示空间中最近邻样本的候选标签，那么它很可能是一个错误标签。

基于此直觉，本文提出了一种基于实例的剪枝方案，该方案通过计算每个候选标签在最近邻样本中出现的频率来衡量其为错误标签的可能性。具体而言，对于样本 $x_i$ 的第 $j$ 个候选标签 $y_{ij}$，我们定义了一个指标 $O_{ij}$ 来衡量其为错误标签的可能性：

$$
O_{ij} = \sum_{v=1}^{k} \mathbb{I}[y_{ij} \notin Y_{p_{v}^{i}}], \quad \forall j \in Y_i,
$$

其中，$Y_{p_{v}^{i}}$ 表示 $x_i$ 的第 $v$ 个最近邻样本的候选标签集，$\mathbb{I}[.]$ 表示指示函数。

然后，我们根据 $O_{ij}$ 的值来剪枝。具体而言，对于每个样本 $x_i$，我们从其候选标签集中删除 $O_{ij}$ 值最大的 $\gamma_i$ 个候选标签：

$$
r_{Y_i} = Top-\gamma_i-argmax_{j \in Y_i} (O_{ij}),
$$

其中，$Top-\gamma_i-argmax$ 返回 $O_{ij}$ 值最大的 $\gamma_i$ 个候选标签的索引。

理论分析

本文对 CLSP 的剪枝误差进行了理论分析，证明了剪枝误差的上界，并分析了表示质量对算法的影响。

定理 1： 假设 PLL 数据集满足 $(k, \delta_k, \rho_k)$ 标签可区分性。对于每个 PLL 样本 $(x_i, Y_i)$，假设 $Y_i$ 中的第 $y$ 个标签是真实标签，$Y_i^{1}$ 中的第 $y_1$ 个标签是任意一个错误标签，即 $y_1 \neq y$。给定剪枝的标签数量 $\gamma_i$，则发生错误剪枝的概率可以被以下上界限制：

$$
P(O_{iy_1} < O_{iy}) \leq \sum_{j=1}^{k} \sum_{m=\xi_i}^{|Y_i^{1}|} \binom{|Y_i^{1}|}{m} \eta^m (1-\eta)^{|Y_i^{1}|-m} \delta_k \binom{k}{j},
$$

其中，$\xi_i = |Y_i^{1}| - \gamma_i + 1$，$\eta = I_{\rho_k}(k-j+1, j)$，$I_{\rho_k}(k, j)$ 表示正则化不完全贝塔函数，$\binom{n}{r}$ 表示组合公式。

定理 2： 在定理 1 的假设下，当增加剪枝的标签数量（即 $\gamma_i^2 > \gamma_i^1$）时，额外的剪枝误差可以被以下上界限制：

$$
P(O_{iy_1} < O_{iy}) - P(O_{iy_1} < O_{iy}) \leq \sum_{j=1}^{k} \sum_{m=\xi_1^i}^{\xi_2^i-1} \binom{|Y_i^{1}|}{m} \eta^m (1-\eta)^{|Y_i^{1}|-m} \delta_k \binom{k}{j},
$$

其中，$\xi_1^i = |Y_i^{1}| - \gamma_i^1 + 1$，$\xi_2^i = |Y_i^{1}| - \gamma_i^2 + 1$，其他符号与定理 1 中相同。

实验结果

本文在 CIFAR-10、CIFAR-100、Tiny-ImageNet 和 PASCAL VOC 等数据集上进行了大量实验，验证了 CLSP 方法的有效性。实验结果表明，CLSP 方法能够显著提高各种深度 PLL 方法的性能，尤其是在标签依赖和实例依赖的候选标签生成情况下。

结论

本文提出了一种新的数据驱动方法 CLSP，用于减少 PLL 样本的候选标签集大小。该方法基于表示空间中最近邻样本的“投票”机制来识别潜在的错误标签。理论分析表明，表示质量和标签歧义对剪枝误差的上界有显著影响。实验结果表明，CLSP 方法能够显著提高现有深度 PLL 方法的性能，尤其是在现实世界的数据集上。

参考文献

Cour, T., Sapp, B., and Taskar, B. (2011). Learning from partial labels. Journal of Machine Learning Research, 12, 1501-1536.
Feng, L., Wang, C., Li, B., and Yang, G. (2020). Deep partial label learning with label disambiguation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 4678-4685.
He, S., Wang, C., Yang, G., and Feng, L. (2022). Towards robust deep partial label learning with noisy candidate label sets. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 11489-11497.
Hong, J., Wang, C., Yang, G., and Feng, L. (2023). Towards robust deep partial label learning with long-tailed data distributions. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 12587-12596.
Johnson, J., Douze, M., Jégou, H., and others. (2019). Faiss: A library for efficient similarity search and clustering of dense vectors. Proceedings of the 31st International Conference on Neural Information Processing Systems, 7313-7323.
Li, B., Wang, C., Yang, G., and Feng, L. (2023a). LAVIS: A library for language-and-vision intelligence. arXiv preprint arXiv:2304.05403.
Li, J., Li, H., Gao, T., et al. (2023b). BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders. arXiv preprint arXiv:2301.12546.
Li, X., Li, J., Yuan, L., et al. (2021). Aligning Books and Movies: Towards Unified Language-Vision Understanding with Contrastive Learning. arXiv preprint arXiv:2105.13240.
Lv, J., Wang, C., Feng, L., and Yang, G. (2020). Progressive purification for deep partial label learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 4686-4693.
Radford, A., Kim, J. W., Hallacy, C., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv preprint arXiv:2103.00020.
Sener, O. and Savarese, S. (2018). Active learning for convolutional neural networks: A core-set approach. Proceedings of the 32nd International Conference on Machine Learning, 4898-4907.
Toneva, M., Gordon, A., Shlens, J., et al. (2018). An Empirical Study of Example Forgetting During Deep Neural Network Training. arXiv preprint arXiv:1806.07683.
Wang, C., Feng, L., Yang, G., et al. (2022a). Long-tailed Deep Partial Label Learning. arXiv preprint arXiv:2205.14887.
Wang, C., Feng, L., Yang, G., and Li, B. (2022b). Deep partial label learning with class prototypes. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 10595-10603.
Wang, C., Feng, L., Yang, G., et al. (2024). Towards robust deep partial label learning with noisy candidate label sets. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 10073-10082.
Wen, Y., Wang, C., Yang, G., et al. (2021). Deep Partial Label Learning with Label Consistency Regularization. arXiv preprint arXiv:2104.00463.
Wu, Q., Wang, C., Yang, G., et al. (2022). Towards Robust Deep Partial Label Learning with Label Consistency Regularization. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 10493-10501.
Wu, Q., Wang, C., Yang, G., et al. (2022). Towards Robust Deep Partial Label Learning with Label Consistency Regularization. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 10493-10501.
Xu, C., Wang, C., Yang, G., et al. (2021). Deep Partial Label Learning with Instance-Dependent Candidate Label Sets. arXiv preprint arXiv:2104.00463.
Xu, C., Wang, C., Yang, G., et al. (2023a). Deep Partial Label Learning with Noisy Candidate Label Sets. arXiv preprint arXiv:2304.00463.
Xu, C., Wang, C., Yang, G., and Feng, L. (2023b). Progressive label purification for deep partial-label learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 12630-12639.
Yan, Y. and Guo, Y. (2023a). Robust Deep Partial Label Learning with Noisy Candidate Label Sets. arXiv preprint arXiv:2304.00463.
Yan, Y. and Guo, Y. (2023b). Towards Robust Deep Partial Label Learning with Noisy Candidate Label Sets. arXiv preprint arXiv:2304.00463.
Zeng, Z., Chen, X., Lin, Z., et al. (2013). Learning with Partial Labels for Image Tagging. arXiv preprint arXiv:1305.2093.
Zhang, M. and Yu, P. S. (2015). Partial label learning via matrix completion. Proceedings of the 32nd International Conference on Machine Learning, 127-136.
Zhang, M., Yu, P. S., and others. (2016). Learning from incomplete and noisy labels. Proceedings of the 33rd International Conference on Machine Learning, 1707-1715.
Zhang, Y., Wang, C., Yang, G., et al. (2022). Deep Partial Label Learning with Class Activation Value. arXiv preprint arXiv:2204.00463.
Zhao, J., Liu, Z., Wang, M., et al. (2021). Gradient-Based Data Pruning for Efficient Deep Learning. arXiv preprint arXiv:2102.06801.
Zhou, Z. H. (2018). A brief introduction to weakly supervised learning. National Science Review, 5, 44-53.
Zhu, X., Li, Z., Gong, M., et al. (2022). Clusterability: A New Perspective for Deep Representation Learning. arXiv preprint arXiv:2203.01077.