Learning distance metrics with dimension constraints
-
摘要: 为提高分类精度,通过距离测度学习可以得到样本在新的特征空间里新的表示.针对马氏距离未考虑不同类别样本维度间相关性存在差异这一缺陷,提出了一种新的有监督的距离测度学习算法,即独立-差别分量分析方法(Independent Discrimi-Native Component Analysis,I-DCA),并将其运用于基于k近邻分类器的运动神经与感觉神经分类中.作为对照,还详细分析了已有的相关分量分析方法(Relevant Component Analysis,RCA)和差别分量分析方法(Discrimi-Native Component Analysis,DCA)这两种距离测度学习算法.实验结果表明,改进算法的分类精度相较于马氏距离提高了近45%相较于RCA与DCA也提高了15%左右,分类精度的提高说明了改进算法的有效性.Abstract: In order to improve the classification accuracy, the new representation of samples can be gotten by distance metric learning. According to mahalanobis distance does not take the difference of the relativity between different classes of sample dimensions into consideration. A new supervised distance metric learning algorithm called independent discrimi-native component analysis (I-DCA) is proposed and applied to classify the motor and sensory nerve based on k nearest neighbor (kNN) algorithm. By contrast, the article also involves the analysis of two existing distance metric learning algorithms in detail, the relevant component analysis (RCA) and the discrimi-native component analysis (DCA). Compared with the mahalanobis distance, the results indicate that the classification precision of the improved algorithm increases by nearly 45%, and it is also greater than 15% compared to the RCA and DCA method. The improved classification precision shows the effectiveness of the new algorithm applied in nerve classification.
-
图 3 I-DCA算法说明
(a) 原始数据集合; (b) 随机选择的带有标签信息的训练样本; (c) 变换后各维度相互独立的训练样本 (d) 带有标签信息的原始数据集; (e) 经马氏距离变换后的数据集; (f) 经I-DCA算法变换后的数据集
Fig. 3 An illustrative example of the I-DCA algorithm
(a) The fully unlabeled data set; (b) Random labeled train data set; (c) The whitening transformation applied to train data; (d) The fully labeled data set with 3 classes; (e) The original data after applying the mahalanobis distance; (f) The original data after applying the I-DCA transformation.
表 1 数据类别名称以及数量
Tab. 1 The data type name and quantity
标号 类别名称 1 Motor_axone (2000) 2 Motor_medullary_sheath (2000) 3 Sensory_axone (2000) 4 Sensory _medullary_sheath (2000) 表 2 近邻总体分类精度比较
Tab. 2 Overall kNN classification precision comparing
train% | test% Euclidean distance% Mahalanobis distance% I-DCA% RCA% DCA% k=1 10|90 86.5 53.61 86.28 69.92 71.83 30|70 94.16 59.41 93.87 74.42 77.33 50|50 96.3 62.42 95.98 75.81 78.31 k=5 10|90 81.72 53.26 81.47 72.07 73.07 30|70 90.34 61.70 90.26 77.16 78.27 50|50 93.65 65.09 93.38 79.59 79.53 k=10 10|90 77.66 51.67 77.16 70.81 73.12 30|70 86.79 58.77 86.63 75.81 77.22 50|50 90.63 63.81 90.57 77.62 79.32 -
[1] 徐沁同. 应用拉曼光谱和超光谱成像技术识别周围神经纤维及神经束功能性质和显微结构的研究[D]. 上海: 复旦大学, 2013. [2] 房娟, 刘洪英, 陈增淦, 等.基于显微高光谱成像技术的运动和感觉神经分类研究[J].影像科学与光化学, 2015, 33(3):203-210. http://www.cnki.com.cn/Article/CJFDTOTAL-GKGH201503003.htm [3] 刘洪英, 李庆利, 顾彬, 等.新型分子高光谱成像系统性能分析及数据预处理[J].光谱学与光谱分析, 2012, 32(11):3161-3166. http://www.cnki.com.cn/Article/CJFDTOTAL-GUAN201211058.htm [4] 刘博. 距离测度学习理论与应用研究[D]. 合肥: 中国科学技术大学, 2009. [5] BAR-HILLEL A, HERTZ T, SHENTAl N, et al. Learning distance function using equivalence relations[C]//Machine Learning, Proceedings of the Twentieth International Conference. 2003:11-18. [6] GOLDBERGER J, ROWEIS S T, HINTON G E, et al. Neighbourhood components analysis.[J]. Advances in Neural Information Processing Systems, 2004, 83(6):513-520. https://www.researchgate.net/publication/221618783_Neighbourhood_Components_Analysis [7] XING E P, NG A Y, JORDAN M I, et al. Distance metric learning, with application to clustering with sideinformation[J]. Advances in Neural Information Processing Systems, 2003, 15:505-512. [8] HOI S C H, LIU W, LYU M R, et al. Learning distance metrics with contextual constraints for image retrieval[C]//IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, 2006:2072-2078. [9] 苟建平. 模式分类的k近邻方法[D]. 成都: 电子科技大学, 2012. [10] 张巍. 基于k近邻分类准则的特征变换算法研究[D]. 上海: 复旦大学, 2007. [11] 张杰. 基于距离测度学习的图像分类方法研究[D]. 上海: 复旦大学, 2010. [12] 刘洪英. 分子超光谱成像的生物组织定量检测与方法研究[D]. 上海: 华东师范大学, 2011.