MATLAB中调用Weka设置方法(转)及示例

本文转自:
http://blog.sina.com.cn/s/blog_890c6aa30101av9x.html

MATLAB命令行下验证Java版本命令

version -java

配置MATLAB调用Java库

  1. Finish Java codes.
  2. Create Java library file, i.e., .jar file.
  3. Put created .jar file to one of directories Matlab uses for storing libraries, and add corresponding path to
    Matlab configuration file, $MATLABINSTALLDIR\$MatlabVersion\toolbox\local\classpath.txt.

配置MATLAB调用Weka

  1. 下载weka
  2. 安装weka
  3. 在环境变量的系统变量中的Path中加入jre6(或者其他的)中bin文件夹的绝对路径,如:
    C:\Program Files\Java\jre1.8.0_77\bin;
  4. 查找MATLAB配置文件classpath.txt
    which classpath.txt %使用这个命令可以查找classpath.txt的位置
  5. 修改配置文件classpath.txt
    edit classpath.txt
    在classpath.txt配置文件中将weka安装目录下的weka.jar的绝对安装路径填入,如:
    C:\Program Files\Weka-3-8\weka.jar
  6. 重启MATLAB
  7. 运行如下命令:
    attributes = javaObject(‘weka.core.FastVector’);
    %如果MATLAB没有报错,就说明配置成功了

  8. Matlab在调用weka中的类时,经常遇见heap space溢出的情况,我们需要设置较大的堆栈,设置方法是:
    Matlab->File->Preference->General->Java Heap Memory, 然后设置适当的值。

Matlab调用Weka示例
代码来自:
http://cn.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
http://www.mathworks.com/matlabcentral/fileexchange/37315-rusboost

clc;
clear all;
close all;file = 'data.csv'; % Dataset% Reading training file
data = dlmread(file);
label = data(:,end);% Extracting positive data points
idx = (label==1);
pos_data = data(idx,:); 
row_pos = size(pos_data,1);% Extracting negative data points
neg_data = data(~idx,:);
row_neg = size(neg_data,1);% Random permuation of positive and negative data points
p = randperm(row_pos);
n = randperm(row_neg);% 80-20 split for training and test
tstpf = p(1:round(row_pos/5));
tstnf = n(1:round(row_neg/5));
trpf = setdiff(p, tstpf);
trnf = setdiff(n, tstnf);train_data = [pos_data(trpf,:);neg_data(trnf,:)];
test_data = [pos_data(tstpf,:);neg_data(tstnf,:)];% Decision Tree
prediction = SMOTEBoost(train_data,test_data,'tree',false);
disp ('    Label   Probability');
disp ('-----------------------------');
disp (prediction);
function prediction = SMOTEBoost (TRAIN,TEST,WeakLearn,ClassDist)
% This function implements the SMOTEBoost Algorithm. For more details on the 
% theoretical description of the algorithm please refer to the following 
% paper:
% N.V. Chawla, A.Lazarevic, L.O. Hall, K. Bowyer, "SMOTEBoost: Improving 
% Prediction of Minority Class in Boosting, Journal of Knowledge Discovery
% in Databases: PKDD, 2003.
% Input: TRAIN = Training data as matrix
%        TEST = Test data as matrix
%        WeakLearn = String to choose algortihm. Choices are
%                    'svm','tree','knn' and 'logistic'.
%        ClassDist = true or false. true indicates that the class
%                    distribution is maintained while doing weighted 
%                    resampling and before SMOTE is called at each 
%                    iteration. false indicates that the class distribution
%                    is not maintained while resampling.
% Output: prediction = size(TEST,1)x 2 matrix. Col 1 is class labels for 
%                      all instances. Col 2 is probability of the instances 
%                      being classified as positive class.javaaddpath('weka.jar');%% Training SMOTEBoost
% Total number of instances in the training set
m = size(TRAIN,1);
POS_DATA = TRAIN(TRAIN(:,end)==1,:);
NEG_DATA = TRAIN(TRAIN(:,end)==0,:);
pos_size = size(POS_DATA,1);
neg_size = size(NEG_DATA,1);% Reorganize TRAIN by putting all the positive and negative exampels
% together, respectively.
TRAIN = [POS_DATA;NEG_DATA];% Converting training set into Weka compatible format
CSVtoARFF (TRAIN, 'train', 'train');
train_reader = javaObject('java.io.FileReader', 'train.arff');
train = javaObject('weka.core.Instances', train_reader);
train.setClassIndex(train.numAttributes() - 1);% Total number of iterations of the boosting method
T = 10;% W stores the weights of the instances in each row for every iteration of
% boosting. Weights for all the instances are initialized by 1/m for the
% first iteration.
W = zeros(1,m);
for i = 1:mW(1,i) = 1/m;
end% L stores pseudo loss values, H stores hypothesis, B stores (1/beta) 
% values that is used as the weight of the % hypothesis while forming the 
% final hypothesis. % All of the following are of length <=T and stores 
% values for every iteration of the boosting process.
L = [];
H = {};
B = [];% Loop counter
t = 1;% Keeps counts of the number of times the same boosting iteration have been
% repeated
count = 0;% Boosting T iterations
while t <= T% LOG MESSAGEdisp (['Boosting iteration #' int2str(t)]);if ClassDist == true% Resampling POS_DATA with weights of positive examplePOS_WT = zeros(1,pos_size);sum_POS_WT = sum(W(t,1:pos_size));for i = 1:pos_sizePOS_WT(i) = W(t,i)/sum_POS_WT ;endRESAM_POS = POS_DATA(randsample(1:pos_size,pos_size,true,POS_WT),:);% Resampling NEG_DATA with weights of positive exampleNEG_WT = zeros(1,neg_size);sum_NEG_WT = sum(W(t,pos_size+1:m));for i = 1:neg_sizeNEG_WT(i) = W(t,pos_size+i)/sum_NEG_WT ;endRESAM_NEG = NEG_DATA(randsample(1:neg_size,neg_size,true,NEG_WT),:);% Resampled TRAIN is stored in RESAMPLEDRESAMPLED = [RESAM_POS;RESAM_NEG];% Calulating the percentage of boosting the positive class. 'pert'% is used as a parameter of SMOTEpert = ((neg_size-pos_size)/pos_size)*100;else % Indices of resampled trainRND_IDX = randsample(1:m,m,true,W(t,:));% Resampled TRAIN is stored in RESAMPLEDRESAMPLED = TRAIN(RND_IDX,:);% Calulating the percentage of boosting the positive class. 'pert'% is used as a parameter of SMOTEpos_size = sum(RESAMPLED(:,end)==1);neg_size = sum(RESAMPLED(:,end)==0);pert = ((neg_size-pos_size)/pos_size)*100;end% Converting resample training set into Weka compatible formatCSVtoARFF (RESAMPLED,'resampled','resampled');reader = javaObject('java.io.FileReader','resampled.arff');resampled = javaObject('weka.core.Instances',reader);resampled.setClassIndex(resampled.numAttributes()-1);% New SMOTE boosted data gets stored in Ssmote = javaObject('weka.filters.supervised.instance.SMOTE');pert = ((neg_size-pos_size)/pos_size)*100;smote.setPercentage(pert);smote.setInputFormat(resampled);S = weka.filters.Filter.useFilter(resampled, smote);% Training a weak learner. 'pred' is the weak hypothesis. However, the % hypothesis function is encoded in 'model'.switch WeakLearncase 'svm'model = javaObject('weka.classifiers.functions.SMO');case 'tree'model = javaObject('weka.classifiers.trees.J48');case 'knn'model = javaObject('weka.classifiers.lazy.IBk');model.setKNN(5);case 'logistic'model = javaObject('weka.classifiers.functions.Logistic');endmodel.buildClassifier(S);pred = zeros(m,1);for i = 0 : m - 1pred(i+1) = model.classifyInstance(train.instance(i));end% Computing the pseudo loss of hypothesis 'model'loss = 0;for i = 1:mif TRAIN(i,end)==pred(i)continue;elseloss = loss + W(t,i);endend% If count exceeds a pre-defined threshold (5 in the current% implementation), the loop is broken and rolled back to the state% where loss > 0.5 was not encountered.if count > 5L = L(1:t-1);H = H(1:t-1);B = B(1:t-1);disp ('          Too many iterations have loss > 0.5');disp ('          Aborting boosting...');break;end% If the loss is greater than 1/2, it means that an inverted% hypothesis would perform better. In such cases, do not take that% hypothesis into consideration and repeat the same iteration. 'count'% keeps counts of the number of times the same boosting iteration have% been repeatedif loss > 0.5count = count + 1;continue;elsecount = 1;end        L(t) = loss; % Pseudo-loss at each iterationH{t} = model; % Hypothesis function   beta = loss/(1-loss); % Setting weight update parameter 'beta'.B(t) = log(1/beta); % Weight of the hypothesis% At the final iteration there is no need to update the weights any% furtherif t==Tbreak;end% Updating weight    for i = 1:mif TRAIN(i,end)==pred(i)W(t+1,i) = W(t,i)*beta;elseW(t+1,i) = W(t,i);endend% Normalizing the weight for the next iterationsum_W = sum(W(t+1,:));for i = 1:mW(t+1,i) = W(t+1,i)/sum_W;end% Incrementing loop countert = t + 1;
end% The final hypothesis is calculated and tested on the test set
% simulteneously.%% Testing SMOTEBoost
n = size(TEST,1); % Total number of instances in the test setCSVtoARFF(TEST,'test','test');
test = 'test.arff';
test_reader = javaObject('java.io.FileReader', test);
test = javaObject('weka.core.Instances', test_reader);
test.setClassIndex(test.numAttributes() - 1);% Normalizing B
sum_B = sum(B);
for i = 1:size(B,2)B(i) = B(i)/sum_B;
endprediction = zeros(n,2);for i = 1:n% Calculating the total weight of the class labels from all the models% produced during boostingwt_zero = 0;wt_one = 0;for j = 1:size(H,2)p = H{j}.classifyInstance(test.instance(i-1));      if p==1wt_one = wt_one + B(j);else wt_zero = wt_zero + B(j);           endendif (wt_one > wt_zero)prediction(i,:) = [1 wt_one];elseprediction(i,:) = [0 wt_one];end
end
function r = CSVtoARFF (data, relation, type)
% csv to arff file converter% load the csv data
[rows cols] = size(data);% open the arff file for writing
farff = fopen(strcat(type,'.arff'), 'w');% print the relation part of the header
fprintf(farff, '@relation %s', relation);% Reading from the ARFF header
fid = fopen('ARFFheader.txt','r');
tline = fgets(fid);
while ischar(tline)tline = fgets(fid);fprintf(farff,'%s',tline);
end
fclose(fid);% Converting the data
for i = 1 : rows% print the attribute values for the data pointfor j = 1 : cols - 1if data(i,j) ~= -1 % check if it is a missing valuefprintf(farff, '%d,', data(i,j));elsefprintf(farff, '?,');endend% print the label for the data pointfprintf(farff, '%d\n', data(i,end));
end% close the file
fclose(farff);r = 0;
function model = ClassifierTrain(data,type)
% Training the classifier that would do the sample selectionjavaaddpath('weka.jar');CSVtoARFF(data,'train','train');
train_file = 'train.arff';
reader = javaObject('java.io.FileReader', train_file);
train = javaObject('weka.core.Instances', reader);
train.setClassIndex(train.numAttributes() - 1);
% options = javaObject('java.lang.String');switch typecase 'svm'model = javaObject('weka.classifiers.functions.SMO');kernel = javaObject('weka.classifiers.functions.supportVector.RBFKernel');model.setKernel(kernel);case 'tree'model = javaObject('weka.classifiers.trees.J48');% options = weka.core.Utils.splitOptions('-C 0.2');% model.setOptions(options);case 'knn'model = javaObject('weka.classifiers.lazy.IBk');model.setKNN(5);case 'logistic'model = javaObject('weka.classifiers.functions.Logistic');
endmodel.buildClassifier(train);
function prediction = ClassifierPredict(data,model)
% Predicting the labels of the test instances
% Input: data = test data
%        model = the trained model
%        type = type of classifier
% Output: prediction = prediction labelsjavaaddpath('weka.jar');CSVtoARFF(data,'test','test');
test_file = 'test.arff';
reader = javaObject('java.io.FileReader', test_file);
test = javaObject('weka.core.Instances', reader);
test.setClassIndex(test.numAttributes() - 1);prediction = [];
for i = 0 : size(data,1) - 1p = model.classifyInstance(test.instance(i));prediction = [prediction; p];
end

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/246963.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

webpack4配置基础

前言 为什么要使用构建工具&#xff1f; 1.转换ES6语法&#xff08;很多老版本的浏览器不支持新语法&#xff09; 2.转换JSX 3.CSS前缀补全/预处理器 4.压缩混淆&#xff08;将代码逻辑尽可能地隐藏起来&#xff09; 5.图片压缩 6. .... 为什么选择webpack&#xff1f; 社区…

RESTful API概述

什么是REST REST与技术无关&#xff0c;代表的是一种软件架构风格&#xff0c;REST是Representational State Transfer的简称&#xff0c;中文翻译为“表征状态转移”。这里说的表征性&#xff0c;就是指资源&#xff0c;通常我们称为资源状态转移。 什么是资源&#xff1f; 网…

AI 《A PROPOSAL FOR THE DARTMOUTH SUMMER RESEARCH PROJECT ON ARTIFICIAL INTELLIGENCE》读后总结

本文转载&#xff1a; http://www.cnblogs.com/SnakeHunt2012/archive/2013/02/18/2916242.html 《A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence》&#xff0c;这是AI领域的开山之作&#xff0c;是当年达特茅斯会议上各路大牛们为期两个月…

第94:受限玻尔兹曼机

转载于:https://www.cnblogs.com/invisible2/p/11565179.html

安装完Ubuntu桌面后要做的(待续)

1. 为了快速而顺畅的更新&#xff0c;打开终端并输入以下命令来让系统使用新软件库&#xff1a; $ sudo apt-get update 2. 更改系统外观和行为 如果你想要更改桌面背景或图标大小&#xff0c;依次打开System Settings –> Appearance –> Look&#xff0c;并对桌面进…

算法第二章上机实践报告

一、实践题目 改写二分搜索算法 二、问题描述 这道题目主要是考验同学们在熟练掌握二分搜索法的前提下&#xff0c;对二分搜索的结构和运用有一个更加深刻的掌握。首先是要了解二分搜索的结构&#xff0c;其次&#xff0c;要了解二分搜索中的分治方法每一个步骤的用意&#xff…

windows远程登录 ubuntu Linux 系统及互连共享桌面

预备工作 #开启防火墙端口 sudo ufw allow 3389#安装ssh sudo apt-get install openssh-server一、windows直连Ubuntu16.04共享桌面 1、打开终端&#xff0c;安装xrdp,vncserver sudo apt-get install xrdp vnc4server xbase-clients2、安装desktop sharing&#xff08;Ubuntu…

RAID详解

一、raid什么意思&#xff1f; RAID是“Redundant Array of Independent Disk”的缩写&#xff0c;中文翻译过来通俗的讲就是磁盘阵列的意思&#xff0c;也就是说RAID就是把硬盘做成一个阵列&#xff0c;而阵列也就是把硬盘进行组合配置起来&#xff0c;做为一个整体进行管理&a…

webpack4进阶配置

移动端CSS px自动转换成rem 需要两步来实现&#xff1a; px2rem-loader 在构建阶段将px转换成remlib-flexible 页面渲染时动态计算根元素的font-size值&#xff08;手机淘宝开源库&#xff09;下载插件并配置&#xff1a; npm i px2rem-loader lib-flexiblemodule: {rules: [{t…

MBR与GPT的区别

由于在服务器上装windows系统&#xff0c;一共有3个4T的硬盘&#xff0c;但是在windows系统下最大显示的为7T&#xff0c;这是因为3个4T硬盘做了Raid5&#xff0c;即&#xff1a;3.6Tx&#xff08;3-1&#xff09; 7T,大约是7T。由于单个移动硬盘大于2T&#xff0c;而MBR格式的…

Servlet-三大域对象

request request是表示一个请求&#xff0c;只要发出一个请求就会创建一个request&#xff0c;它的作用域&#xff1a;仅在当前请求中有效。用处&#xff1a;常用于服务器间同一请求不同页面之间的参数传递&#xff0c;常应用于表单的控件值传递。常用方法&#xff1a;request.…

装windows和Linux系统时找不到硬盘,pe安装系统没有出现磁盘,不能识别磁盘

装win7的时候&#xff0c;我们使用U盘装系统&#xff0c;找不到硬盘&#xff0c; 或者使用光盘装系统时 会出现 缺少所需的CD/DVD驱动器设备驱动程序 然后找遍整个硬盘/光盘也找不到合适的驱动&#xff0c;安装无法继续。 解决方法&#xff1a; ACHI模式下&#xff0c;PE里…

JSP四大域对象与九大内置对象

域对象的作用:保存数据,获取数据,共享数据.page&#xff1a;jsp页面被执行&#xff0c;生命周期开始&#xff0c;jsp页面执行完毕&#xff0c;生命周期结束&#xff08;jsp当前页面有效&#xff09;request&#xff1a;用户发送一个请求&#xff0c;生命周期开始&#xff0c;服…

解决ubuntu 15.04 安装matlab后无法找到matlab执行文件的问题

在ubuntu 15.04上安装好maltab R2015b之后&#xff0c;进入文件夹&#xff1a; /usr/local/MATLAB/R2015b/bin 没有发现matlab可执行文件&#xff0c;可是在文件管理器中又能搜索到matlab文件&#xff0c;是在其子目录glnxa64下。但进入子目录后&#xff0c;在终端输入命令&a…

一个写得很不错的vuex详解(转)

https://segmentfault.com/a/1190000015782272?utm_sourcetag-newest 转载于:https://www.cnblogs.com/hj0711/p/11577582.html

Linux 服务器上建立用户并分配权限

查看用户 whoami #要查看当前登录用户的用户名 who am i #表示打开当前伪终端的用户的用户名 who mom likes who 命令其它常用参数 参数 说明 -a 打印能打印的全部 -d 打印死掉的进程 -m 同am i,mom likes -q 打印当前登录用户数及用户名 -u 打印当前登录用户登录信…

HttpServletRequest

HttpServletRequest介绍 HttpServletRequest对象代表客户端的请求&#xff0c;当客户端通过HTTP协议访问服务器时&#xff0c;HTTP请求头中的所有信息都封装在这个对象中&#xff0c;通过这个对象提供的方法&#xff0c;可以获得客户端请求的所有信息。 二、Request常用方法 2.…

Linux 释放cpugpu内存、显存和硬盘

free -m free -mtotal used free shared buff/cache available Mem: 128831 15666 23617 406 89547 111448 Swap: 130986 130977 9 total 内存总数 used 已经使用的内存数 free 空闲…

POS时机未到,POW强攻是实现全球货币的正确道路

POS时机未到&#xff0c;POW强攻是实现全球货币的正确道路 取代现今的货币体系的正确进攻方式是POW强攻&#xff0c;现在的货币是由力量背书的&#xff0c;以后的货币也是由力量背书的&#xff0c;只有因造币耗费的力量超过了所有其它力量的时候才能取代成功&#xff0c;才能消…

Ubuntu15.04 64位安装Theano(已经测试可执行)

备注&#xff1a;之前服务器上已经安装caffe&#xff0c;后安装Theano&#xff0c;所有有些步骤简略。 安装caffe详情见 Caffe Ubuntu 15.04 CUDA 7.5 在服务器上安装配置及卸载重新安装&#xff08;已测试可执行&#xff09; 安装所需的安装包见 链接: http://pan.baid…