机器学习 属性
Today, we will be looking at the use of attribute relation file format for machine learning in java and we would be writing a small java code to convert the popularly used .csv file format into the arff (Attribute relation file format). This file format was developed by the computer science department of the University of Waikato, as the name suggests the file contains a list of attributes and one class attribute. The attribute relation file format is broadly divided into two portions:
今天,我们将研究在Java中使用属性关系文件格式进行机器学习,并且我们将编写一个小的Java代码,将常用的.csv文件格式转换为arff(属性关系文件格式) 。 这种文件格式是由怀卡托大学计算机科学系开发的,顾名思义,该文件包含一个属性列表和一个类属性。 属性关系文件格式大致分为两部分:
Header field
标头字段
Data field
资料栏位
Now, we would be discussing these fields in detail,
现在,我们将详细讨论这些领域,
1) Header field
1)标头字段
The header field describes the name of the attributes, type of relation and their datatypes that are present in the data file the main difference between them .CSV and .arff file are that the in .CSV files you will find the values of the attributes just below their name but in .arff files, the name of the attributes are specified separately followed by the data which is present in a separate data field. The basic syntax for writing the attribute name In the header portion is as follows:
报头字段描述了属性,关系类型和数据类型存在于数据文件它们之间的主要区别.csv和.arff文件是中.CSV文件,你会发现值的属性刚刚的名字在其名称下方,但在.arff文件中,分别指定属性名称,后跟单独数据字段中的数据。 在标头部分写入属性名称的基本语法如下:
@attribute <attribute-name> <datatype>
The image below shows an example of .arff file format,
下图显示了.arff文件格式的示例,
The following example is a data set contains the head-brain relation of the various users. From the picture above one can easily identify the number of attributes along with the type of data that they contain in our example all the data in all four attributes are in the form of number i.e. numeric. Apart from being numeric, the data type can be of the form of nominal, string type and data type specification.
下面的示例是一个数据集,其中包含各个用户的头颅关系。 从上面的图片中,我们可以轻松地识别出属性的数量以及它们所包含的数据类型,在我们的示例中,所有四个属性中的所有数据都是数字即数字形式。 除了数字以外,数据类型还可以采用名义,字符串类型和数据类型规范的形式。
2) Data field
2)资料栏位
This field contains the data values of the attributes mentioned above in the attribute field these are the values will be used by our model to perform prediction and to determine the amount of accuracy that can be provided in the result of our model. The data present is separated by the comas under the heading of @data. The data as mentioned above in the attributes field can be as follows:
此字段包含属性字段中上述属性的数据值,这些值将由我们的模型用于执行预测并确定可以在模型结果中提供的准确度。 存在的数据在@data标题下用逗号分隔。 上面在属性字段中提到的数据可以如下:
Numerical
数值型
Nominal
标称
String
串
Date-time format
日期时间格式
The .CSV file, that I have used can be downloaded from here: headbrain7.csv
我使用过的.CSV文件可以从这里下载: headbrain7.csv
Below is the code is written in Java in eclipse IDE for converting the .CSV file into .arff file format make sure you have set the path to the weka.jar file if you haven’t, then just have a look at my previous article: Introduction to weka and Machine learning in Java
以下是在eclipse IDE中用Java编写的代码,用于将.CSV文件转换为.arff文件格式,请确保已将weka.jar文件的路径设置为,如果没有,请看一下我的前一篇文章: Java中的weka和机器学习简介
Code:
码:
import java.io.File;
import java.io.IOException;
import weka.*;
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
public class wekaapi {
public static void main(String[] args) throws IOException {
// load the CSV file
CSVLoader load = new CSVLoader();
loader.setSource(new File("C:\\Users\\Logan\\Desktop\\ML\\linearregression\\headbrain.csv"));
Instances data = load.getDataSet();//get instances object
ArffSaver save = new ArffSaver();
save.setInstances(data);//set the dataset we want to convert
save.setFile(new File("C:\\Users\\Logan\\Desktop\\ML\\headbrain.arff"));
System.out.println("The .arff file format is as follows");
save.writeBatch();
System.out.println(data);
}
}
Output
输出量
Clean display and proper orientation of data make .arff files a popular choice among the data scientists for their analysis this was all for today guys, Hope you liked this article and stay tuned for more and have a great day ahead.
整洁的显示和正确的数据方向使.arff文件成为数据科学家在分析中的普遍选择,这对于今天的人来说都是如此。希望您喜欢这篇文章,并继续关注,以取得美好的一天。
翻译自: https://www.includehelp.com/ml-ai/attribute-relation-file-format.aspx
机器学习 属性