Feature-Selection-for-Ranking-数据挖掘课件.ppt
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《Feature-Selection-for-Ranking-数据挖掘课件.ppt》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Feature Selection for Ranking 数据 挖掘 课件
- 资源描述:
-
1、Feature Selection for Ranking ABSTRACT The reality is that many feature selection methods used in classification are directly applied to ranking We argue that because of the striking differences between ranking and classification,it is better to develop different feature selection methods for rankin
2、g.In this paper we propose a new feature selection methodDefine the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature and the correlation between the ranking results of two features as the similarity between them.We also demonstrate how to solve t
3、he optimization problem in an efficient way.We have tested the effectiveness of our feature selection method on two information retrieval datasets and with two ranking model 1.INTRODUCTION Ranking SVM and RankNet First,feature selection can help enhance accuracy in many machine learning problems,whi
4、ch strongly indicates that feature selection is also necessary for ranking Second,feature selection can also help improve the efficiency of training.In information retrieval,especially in web search,usually the data size is very large and thus training of ranking models is computationally costly.The
5、re have been no methods of feature selection dedicatedly proposed for ranking.Most of the methods used in ranking were developed for classification.Feature selection methods in classification fall into three categorie The first category,which is named filter,feature selection is defined as a preproc
6、essing step and can be independent from learning.that information gain(IG)and chi-square(CHI)are among the most effective methods of feature selection for classification.The second category referred to as wrapperutilizes the learning system as a black box to scoresubsets of features The third catego
7、ry called the embeddedmethod performs feature selection withinthe process of training.Second,the evaluation measures(e.g.meanaverage precision(MAP)and normalizeddiscounted cumulative gain(NDCG)used inranking problems are different from thosemeasures used in classification 1)in ranking usually precis
8、ion is more important than recall while in classification both precision and recall are important;Several problems may arise when applying the feature selection methods to ranking First:existing feature selection methods for classification are not suitable for ranking.In ranking,a number of ordered
9、categories are used,representing the ranking relationship between instances,while in classification the categories are“flat”2)in ranking correctly ranking the top-n instances is more critical while in classification making a correct classification decision is of equal significance for all instances.
10、The following are the properties of the novel method for feature selection in ranking 1)The method makes use of ranking information,instead of simply viewing the ranks as flat categories.2)Inspired by the work in 11427,it considers the similarities between features,and tries to avoid selecting redun
11、dant features.3)It models feature selection for ranking as a multi-objective optimization problem.The final objective is to find a set of features with maximum importance and minimum similarity.4)It provides a greedy search algorithm to solve the optimization problem.The corresponding solution produ
12、ced is proven to be equivalent to the optimal solution to the original problem under certain condition.2.FEATURE SELECTIONMETHOD Overview Suppose the goal is to select(1tm)features fromthe entire feature set 1,2,.In our method we firstdefine the importance score of each feature,and definethe similar
13、ity between any two features and.Then weemploy an efficient algorithm to maximize the totalimportance scores and minimize the total similarityscores of a set of features.2.2 Importance of feature We first assign an importance score to each feature.Specifically,we propose using an evaluation measure
14、like MAP and NDCG or a loss function(e.g.pair-wise ranking errors 1013)to compute the importance score.In the former,we first rank instances using the feature,evaluate the performance in terms of the measure,and then take the evaluation result as the importance score.In the latter,we also rank insta
15、nces using the feature,and then view a score inversely proportional to the corresponding loss as the importance score 2.3 Similarity between features In this work,we measure the similarity between any two features on the basis of their ranking results.That is,we regard each feature as a ranking mode
16、l,and the similarity between two features is represented by the similarity between the ranking results that they produce.we choose Kendalls as an example.The Kendalls value of query q for any two features and can be calculated as follows,2.4 Optimization formulation we want to select those features
17、with largest total importance scores and smallest total similarity scores.Mathematically,this can be represented as follows:In(1),there are two objectives:to maximize the sum ofthe importance scores of individual features,and tominimize the sum of similarity scores between any twofeatures.we take a
18、common approach in optimizationand convert multi-objective programming to singleobjective programming using linear combination.2.5 Solution to optimization problem The optimization in(2)is a typical 0-1 integer programming problem.One possible approach would be to perform exhaustive search.However,t
19、he time complexity of it,(),is too high to make it applicable in real applications.We need to look for more practical solutions.In this work,we propose a greedy search algorithm for tackling the issueAlgorithm GAS(Greedy search Algorithm of feature Selection)1.Construct an undirected graph G0,in whi
20、ch each node represents a feature,the weight of node is and the weight of an edge between node and node is 2.Construct a set S to contain the selected features.Initially S0=.3.For i=1t,(1)Select the node with the largest weight,without loss of generality,suppose that the selected node is(2)A punishm
展开阅读全文