一、熵权法
又称熵值法,是一种客观赋权法,根据各项指标观测值所提供的信息大小来确定指标权重,具体细节可以参阅Stata-熵值法(熵权法)计算实现。
二、原理
根据指标特性,可以用熵值判断某个指标的离散程度,熵值越小表示离散程度越大,因此该指标对综合评价的影响(权重)越大。假设现有 m 个样本与 n 个评价指标,则初始数据矩阵如下:
X = [ x 11 x 12 ⋯ x 1 n x 21 x 22 ⋯ x 2 n ⋮ ⋮ ⋱ ⋮ x m 1 x m 2 ⋯ x m n ] X=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} X= x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn
对于某项指标 x j x_j xj ,不同样本之间的指标值 x i j x_{ij} xij 的差距越大,则该指标在综合评价中的作用越大,如果某项指标值全部相等,则再综合评价中不起作用。
三、流程
1. 预处理
对于 TIFF 文件,通常将无效值和背景值确定为某一个具体的数值(如 nan 或 -3.4028235e38),因此首先需要去除这些数值,防止对后续计算过程产生影响,代码中将其赋为 0 :
tmp[np.isclose(im_data, -3.4028235e38, atol=1e-6)] = 0
tmp[np.isnan(im_data)] = 0
2. 标准化
指标分为正向指标和负向指标,因此在标准化过程中,需要根据不同指标的特点进行处理,以保证后续的处理过程无需考虑正向或负向,可以使用一套代码执行,因此对于正向指标,标准化公式如下:
z i j = x i j − min x j max x j − min x j z_{ij}=\frac{x_{ij}-\min x_j}{\max x_j-\min x_j} zij=maxxj−minxjxij−minxj
对于负向指标,标准化公式如下:
z i j = max x j − x i j max x j − min x j z_{ij}=\frac{\max x_j-x_{ij}}{\max x_j-\min x_j} zij=maxxj−minxjmaxxj−xij
其中, z i j z_{ij} zij 表示标准化后第 i i i 个样本的第 j j j 个指标值。
3. 计算指标比重
计算第 j j j 个指标下第 i i i 个样本所占比重,公式如下:
p i j = z i j ∑ i = 1 m z i j p_{ij}=\frac{z_{ij}}{\sum_{i=1}^m z_{ij}} pij=∑i=1mzijzij
4. 计算熵值
计算第 j j j 个指标的熵值,公式如下:
e j = − k ∑ i = 1 m p i j ln p i j e_j=-k\sum_{i=1}^m p_{ij}\ln p_{ij} ej=−ki=1∑mpijlnpij
k = 1 ln m k=\frac{1}{\ln m} k=lnm1
其中, k > 0 , e j > 0 k>0,e_j>0 k>0,ej>0 ,因此 0 ≤ e j ≤ 1 0\le e_j\le1 0≤ej≤1 。
5. 计算信息效用值
计算第 j j j 个指标的信息效用值:
d j = 1 − e j d_j=1-e_j dj=1−ej
6. 计算各项指标权重
w j = d j ∑ i = 1 m d j w_j=\frac{d_j}{\sum_{i=1}^m d_j} wj=∑i=1mdjdj
7. 计算个样本综合得分
s i = ∑ i = 1 n w j × z i j s_i=\sum_{i=1}^n w_j\times z_{ij} si=i=1∑nwj×zij
四、Python 代码
# coding=utf-8
import osimport numpy as np
import pandas as pd
from osgeo import gdaldef read_img(filename):# 打开文件dataset = gdal.Open(filename)# 获取栅格数据的元信息im_width = dataset.RasterXSize # 栅格矩阵的列数im_height = dataset.RasterYSize # 栅格矩阵的行数im_bands = dataset.RasterCount # 波段数im_geotrans = dataset.GetGeoTransform() # 仿射矩阵,左上角像素的大地坐标和像素分辨率im_proj = dataset.GetProjection() # 地图投影信息,字符串表示# 读取栅格数据到数组im_data = dataset.ReadAsArray(0, 0, im_width, im_height)# 返回元信息、投影、仿射矩阵和数据数组return im_proj, im_geotrans, im_datadef write_img(filename, im_proj, im_geotrans, im_data, driverName="GTiff"):# 判断栅格数据的数据类型if "int8" in im_data.dtype.name:datatype = gdal.GDT_Byteelif "int16" in im_data.dtype.name:datatype = gdal.GDT_UInt16else:datatype = gdal.GDT_Float32# 判读数组维数if len(im_data.shape) == 3:im_bands, im_height, im_width = im_data.shapeelse:im_bands, (im_height, im_width) = 1, im_data.shape# 创建文件driver = gdal.GetDriverByName(driverName)dataset = driver.Create(filename, im_width, im_height, im_bands, datatype)# 写入仿射变换参数dataset.SetGeoTransform(im_geotrans)# 写入投影dataset.SetProjection(im_proj)# 写入数组数据if im_bands == 1:dataset.GetRasterBand(1).WriteArray(im_data)else:for i in range(im_bands):dataset.GetRasterBand(i + 1).WriteArray(im_data[i])def normalize_data(arr, name):"""数据归一化"""mode = {"GPP年变异系数": 0,"GPP年总值": 1,"干旱胁迫": 0,"供给服务": 1,"固碳释氧": 1,"洪涝灾害": 0,"景观多样性SHDI": 0,"景观连通度COHESION": 1,"景观破碎度ED": 0,"人口密度": 0,"人类干扰指数": 0,"水源涵养": 1,"土地垦殖系数": 0,"土壤保持": 1,}# 计算每一列的最小值和最大值min_values = arr.min()max_values = arr.max()if mode[name] == 1:# 计算正向指标normalized = (arr - min_values) / (max_values - min_values)else:# 计算负向指标normalized = (max_values - arr) / (max_values - min_values)return normalizeddef calculate_entropy(weights):"""计算信息熵"""# 防止出现对数运算中的零值,将小于等于零的值替换为一个很小的正数weights[weights <= 0] = 1e-10# 计算 mm = weights.shape[0]# 计算 kk = 1 / np.log(m)# 计算熵值entropy = []for j in range(weights.shape[1]):tmp = 0.0for i in range(m):tmp += weights[i][j] * np.log(weights[i][j])entropy.append((-k) * tmp)return entropydef calculate_weights(z):"""计算指标比重p"""# 计算每列的和column_sums = z.sum(axis=0)# 计算指标比重weights = z / column_sumsreturn weightsdef calculate_g(e):g = []for t in e:g.append(1 - t)return gdef calculate_w(g):sumG = sum(g)w = []for t in g:w.append(t / sumG)return wdef raster_index_by_sum():data_path = "D:/Download/数据/年份/" # 数据路径csv_path = "D:/Download/数据/sum_entropies.csv"years = os.listdir(data_path) # 获取年份文件夹集合sum = Nonefor year in years:files = os.listdir(data_path + year) # 获取某一年份里的指标路径名df = pd.DataFrame(columns=files)index = [] # 指数im_proj, im_geotrans, im_data = read_img(data_path + year + "/" + files[0]) # 读取栅格数据tmp = np.ones_like(im_data, dtype=int)for file in files:im_proj, im_geotrans, im_data = read_img(data_path + year + "/" + file) # 读取栅格数据tmp[np.isclose(im_data, -3.4028235e38, atol=1e-6)] = 0tmp[np.isnan(im_data)] = 0for file in files:im_proj, im_geotrans, im_data = read_img(data_path + year + "/" + file) # 读取栅格数据im_data = im_data[tmp != 0]z = normalize_data(im_data.flatten(), file[4:-4]) # 数据标准化得到zindex.append(z) # 存到一个数组里index = np.transpose(np.vstack(index)) ## 构造某一年份的栅格-指标矩阵p = calculate_weights(index) # 计算指标比重e = calculate_entropy(p) # 计算熵权if sum is None:sum = eelse:sum = [x + y for x, y in zip(sum, e)]g = calculate_g(sum)w = calculate_w(g)df.loc[len(df)] = wdf.to_csv(csv_path,index=False,sep=",",)def raster_index_by_years():data_path = "D:/Download/数据/年份/" # 数据路径csv_path = "D:/Download/数据/years_entropies.csv"norm_path = "D:/Download/数据/归一化/"if not os.path.exists(data_path):os.mkdir(data_path)if not os.path.exists(csv_path):os.mkdir(csv_path)if not os.path.exists(norm_path):os.mkdir(norm_path)years = os.listdir(data_path) # 获取年份文件夹集合for year in years:if not os.path.exists(norm_path + year):os.mkdir(norm_path + year)files = os.listdir(data_path + years[0])files = [file[4:-4] for file in files]df = pd.DataFrame(columns=files)im_proj, im_geotrans, im_data = read_img(data_path + years[0] + "/" + years[0] + files[0] + ".tif") # 读取栅格数据tmp = np.ones_like(im_data, dtype=int)for year in years:for file in files:im_proj, im_geotrans, im_data = read_img(data_path + year + "/" + year + file + ".tif") # 读取栅格数据tmp[np.isclose(im_data, -3.4028235e38, atol=1e-6)] = 0tmp[np.isnan(im_data)] = 0data = Nonefor year in years:index = []for file in files:im_proj, im_geotrans, im_data = read_img(data_path + year + "/" + year + file + ".tif") # 读取栅格数据write_img(norm_path + year + '/' + year + file + '.tif', im_proj, im_geotrans, normalize_data(im_data, file))im_data = im_data[tmp != 0]z = normalize_data(im_data.flatten(), file) # 数据标准化得到zindex.append(z) # 存到一个数组里index = np.transpose(np.vstack(index)) ## 构造某一年份的栅格-指标矩阵if data is None:data = indexelse:data = np.concatenate((data, index), axis=0)p = calculate_weights(index) # 计算指标比重e = calculate_entropy(p) # 计算熵权g = calculate_g(e)w = calculate_w(g)df.loc[len(df)] = wdf.to_csv(csv_path,index=False,sep=",",)if __name__ == "__main__":# raster_index_by_sum()raster_index_by_years()