kaggle(4) Regression with an Abalone Dataset 鲍鱼数据集的回归

kaggle(4) Regression with an Abalone Dataset 鲍鱼数据集的回归

在这里插入图片描述

import pandas as pd
import numpy as npimport xgboost
import lightgbm
import optuna
import catboostfrom sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_log_error
from sklearn.compose import TransformedTargetRegressor
from sklearn.ensemble import VotingRegressor, StackingRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoderimport seaborn as sns
import matplotlib.pyplot as pltimport warnings
warnings.filterwarnings("ignore")

PROJECT DESCRIPTION

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope

PHYSICAL ATTRIBUTES

SEX: Male/Female/Infant
LENGTH: Longest shell measurement
DIAMETER: Diameter of the Abalone
HEIGHT: Height of the Abalone
WHOLE WEIGHT: Weight of the whole abalone
SHUCKED WEIGHT: Weight of the meat
VISCERA WEIGHT: Gut Weight - Interal Organs
SHELL WEIGHT: Shell Weight after drying
RINGS: Number of rings +1.5 gives Age of the Abalone

项目介绍

通过物理测量预测鲍鱼的年龄。 鲍鱼的年龄是通过将鲍鱼壳从锥体上切开、染色并通过显微镜计算环数来确定的。

物理属性

性别:男/女/婴儿
长度:最长外壳测量值
直径:鲍鱼的直径
高度:鲍鱼的高度
整体重量:整个鲍鱼的重量
去壳重量:肉的重量
内脏重量:肠道重量 - 内脏器官
壳重:干燥后的壳重
:环数+1.5给出鲍鱼的年龄

Load the Datasets 加载数据集

# original = pd.read_csv("/kaggle/input/abalone-dataset/abalone.csv")
# train = pd.read_csv("/kaggle/input/playground-series-s4e4/train.csv")
# test = pd.read_csv("/kaggle/input/playground-series-s4e4/test.csv")
original = pd.read_csv("./data/abalone.csv")
train = pd.read_csv("./data/train.csv")
test = pd.read_csv("./data/test.csv")

Make the data ready for tuning 准备好数据进行调整

train = train.drop("id", axis=1)
train=train.rename(columns={'Whole weight':'Whole weight','Whole weight.1':'Shucked weight', 'Whole weight.2':'Viscera weight', 'Shell weight':'Shell weight'})
test=test.rename(columns={'Whole weight':'Whole weight','Whole weight.1':'Shucked weight', 'Whole weight.2':'Viscera weight', 'Shell weight':'Shell weight'})
train = pd.concat([train, original], axis=0)

Get familier with the Data 熟悉数据

train.head()
SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
0F0.5500.4300.1500.77150.32850.14650.240011
1F0.6300.4900.1451.13000.45800.27650.320011
2I0.1600.1100.0250.02100.00550.00300.00506
3M0.5950.4750.1500.91450.37550.20550.250010
4I0.5550.4250.1300.78200.36950.16000.19759
print(f"The shape of training dataset is : {train.shape}")
print(f"The shape of testing dataset is : {test.shape}")
The shape of training dataset is : (94792, 9)
The shape of testing dataset is : (60411, 9)
test.head()
idSexLengthDiameterHeightWhole weightShucked weightViscera weightShell weight
090615M0.6450.4750.1551.23800.61850.31250.3005
190616M0.5800.4600.1600.98300.47850.21950.2750
290617M0.5600.4200.1400.83950.35250.18450.2405
390618M0.5700.4900.1450.87400.35250.18650.2350
490619I0.4150.3250.1100.35800.15750.06700.1050
train.groupby("Sex").count()["Length"]
Sex
F    27802
I    34435
M    32555
Name: Length, dtype: int64
test.groupby("Sex").count()["Length"]
Sex
F    17387
I    22241
M    20783
Name: Length, dtype: int64
np.sort(pd.unique(train.Rings))
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29], dtype=int64)

View the Distribution 查看分布

train.hist(figsize=(12, 10), grid=True, bins=50)
plt.tight_layout()
plt.axis("off")
(0.0, 1.0, 0.0, 1.0)

在这里插入图片描述

test.hist(figsize=(12, 10), grid=True, bins=50)
plt.tight_layout()
plt.axis("off")
(0.0, 1.0, 0.0, 1.0)

在这里插入图片描述

CONTINUOUS COLUMN ANALYSIS 连续柱分析

# Set up warnings to be ignored (optional)
warnings.filterwarnings("ignore")
pd.set_option('mode.use_inf_as_na', False)train_str = train
train_str['Rings'] = train_str['Rings'].astype(str)# List of continuous variables in your dataset
continuous_vars = ['Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight']# Set hue to your target column
target_column = 'Rings'for column in continuous_vars:fig, axes = plt.subplots(1, 2, figsize=(18, 4))  # Create subplots with 1 row and 2 columns# Plot histogram with hue and explicit labelssns.histplot(data=train_str, x=column, hue=target_column, bins=50, kde=True, ax=axes[0], palette='muted', legend=False)axes[0].set_title(f'Histogram of {column} with {target_column} Hue')axes[0].set_xlabel(column)axes[0].set_ylabel('Count')axes[0].legend(title=target_column, loc='upper right')# Plot KDE plot with hue and explicit labelssns.kdeplot(data=train_str, x=column, hue=target_column, ax=axes[1], palette='muted', legend=False)axes[1].set_title(f'KDE Plot of {column} with {target_column} Hue')axes[1].set_xlabel(column)axes[1].set_ylabel('Density')axes[1].legend(title=target_column, loc='upper right')plt.tight_layout()  # Adjust spacing between subplotsplt.show()

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

ANALYSIS BY QQ PLOT QQ图分析

import scipy.stats as stats  
def qq_plot_with_skewness(data, quantitative_var):# Check if the variable is present in the DataFrameif quantitative_var not in data.columns:print(f"Error: '{quantitative_var}' not found in the DataFrame.")returnf, ax = plt.subplots(1, 2, figsize=(18, 5.5))# Check for missing valuesif data[quantitative_var].isnull().any():print(f"Warning: '{quantitative_var}' contains missing values. Results may be affected.")# QQ plotstats.probplot(data[quantitative_var], plot=ax[0], fit=True)ax[0].set_title(f'QQ Plot for {quantitative_var}')# Skewness plotsns.histplot(data[quantitative_var], kde=True, ax=ax[1])ax[1].set_title(f'Distribution of {quantitative_var}')# Calculate skewness valueskewness_value = stats.skew(data[quantitative_var])# Display skewness value on the plotax[1].text(0.5, 0.5, f'Skewness: {skewness_value:.2f}', transform=ax[1].transAxes, horizontalalignment='center', verticalalignment='center', fontsize=16, color='red')plt.show()
# Example usage for each continuous variable
for var in continuous_vars:qq_plot_with_skewness(train, var)

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

Split the Dataset 分割数据集

sex_to_num = {"M": 0,"F": 1,"I": 2
}
train["Sex"] = train["Sex"].replace(sex_to_num.keys(), [sex_to_num[key] for key in sex_to_num])
test["Sex"] = test["Sex"].replace(sex_to_num.keys(), [sex_to_num[key] for key in sex_to_num])
train.groupby("Sex").count()["Length"]
Sex
0    32555
1    27802
2    34435
Name: Length, dtype: int64
X = train.drop("Rings", axis=1)
y = train.Rings
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42, stratify=y_test)

Here stratify parameter keeps the ratio of Rings same all across the Dtaset

XGBoost

we will be implementing two XGBoost models

def xgb_objective(trial):params = {"eta": trial.suggest_float("eta", 0.01, 1.0),"gamma": 0.0,"max_depth": trial.suggest_int("max_depth", 3, 20),"min_child_weight": trial.suggest_float("min_child_weight", 1., 50.),"subsample": trial.suggest_float("subsample", 0.5, 1.0),"colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),"reg_lambda": trial.suggest_float("lambda", 1.0, 100.0),"n_estimators": trial.suggest_int("n_estimators", 100, 1000)}xgb_reg = TransformedTargetRegressor(xgboost.XGBRegressor(**params, objective='reg:squarederror', grow_policy='lossguide',tree_method="hist", random_state=42),func=np.log1p,inverse_func=np.expm1)xgb_reg.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=False)val_scores = mean_squared_log_error(y_valid, xgb_reg.predict(X_valid), squared=False)return val_scoressampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization
xgb_study = optuna.create_study(direction = 'minimize',study_name="XgbRegressor", sampler=sampler)
[I 2024-04-25 16:46:29,229] A new study created in memory with name: XgbRegressor

XGBoost 1st model

TUNE = False
if TUNE:xgb_study.optimize(xgb_objective, n_trials=500)

Set TUNE parameter to True incase you want to run Hyper Parameter Tuning

xgb_best_params_1 = {'eta': 0.1006321838798394,'max_depth': 6,'min_child_weight': 27.999752791085136,'subsample': 0.7344797943645852,'colsample_bytree': 0.5389765810810496,'lambda': 79.62358968148187,'n_estimators': 407
}
xgb_reg_1 = TransformedTargetRegressor(xgboost.XGBRegressor(**xgb_best_params_1, objective='reg:squarederror', grow_policy='lossguide',tree_method="hist", random_state=42, gamma=0.0),func=np.log1p,inverse_func=np.expm1)
xgb_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.5389765810810496,early_stopping_rounds=None,enable_categorical=False,eta=0.1006321838798394,eval_metric=None,feature_types=None, gamma=0.0,gpu_id...grow_policy=&#x27;lossguide&#x27;,importance_type=None,interaction_constraints=None,lambda=79.62358968148187,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=27.999752791085136,missing=nan,monotone_constraints=None,n_estimators=407, n_jobs=None,num_parallel_tree=None, ...))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" ><label for="sk-estimator-id-1" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.5389765810810496,early_stopping_rounds=None,enable_categorical=False,eta=0.1006321838798394,eval_metric=None,feature_types=None, gamma=0.0,gpu_id...grow_policy=&#x27;lossguide&#x27;,importance_type=None,interaction_constraints=None,lambda=79.62358968148187,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=27.999752791085136,missing=nan,monotone_constraints=None,n_estimators=407, n_jobs=None,num_parallel_tree=None, ...))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-2" type="checkbox" ><label for="sk-estimator-id-2" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.5389765810810496, early_stopping_rounds=None,enable_categorical=False, eta=0.1006321838798394, eval_metric=None,feature_types=None, gamma=0.0, gpu_id=None,grow_policy=&#x27;lossguide&#x27;, importance_type=None,interaction_constraints=None, lambda=79.62358968148187,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=27.999752791085136, missing=nan,monotone_constraints=None, n_estimators=407, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-3" type="checkbox" ><label for="sk-estimator-id-3" class="sk-toggleable__label sk-toggleable__label-arrow">XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.5389765810810496, early_stopping_rounds=None,enable_categorical=False, eta=0.1006321838798394, eval_metric=None,feature_types=None, gamma=0.0, gpu_id=None,grow_policy=&#x27;lossguide&#x27;, importance_type=None,interaction_constraints=None, lambda=79.62358968148187,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=27.999752791085136, missing=nan,monotone_constraints=None, n_estimators=407, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, xgb_reg_1.predict(X_valid), squared=False)
0.1484868328972631
feature_importance = xgb_reg_1.regressor_.feature_importances_
feature_names = X_train.columnssorted_indices = feature_importance.argsort()
sorted_importance = feature_importance[sorted_indices]
sorted_features = feature_names[sorted_indices]plt.figure(figsize=(10, 6))
colors = plt.cm.tab20c.colors[:len(sorted_features)]  
plt.barh(sorted_features, sorted_importance, color=colors)
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('XGBoost Feature Importance')
plt.gca().invert_yaxis() 
plt.tight_layout()  
plt.show()

在这里插入图片描述

XGBoost 2

xgb_best_params_2 = {'eta': 0.08999645298052271,'max_depth': 6,'min_child_weight': 2.088127882610971,'subsample': 0.7725806961689413,'colsample_bytree': 0.9163306027660207,'lambda': 5.356530752285997,'n_estimators': 652
}
xgb_reg_2 = TransformedTargetRegressor(xgboost.XGBRegressor(**xgb_best_params_2, objective='reg:squaredlogerror', grow_policy='depthwise',tree_method="hist", random_state=42),func=np.log1p,inverse_func=np.expm1)
xgb_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.9163306027660207,early_stopping_rounds=None,enable_categorical=False,eta=0.08999645298052271,eval_metric=None,feature_types=None,gamma=None, gpu_...grow_policy=&#x27;depthwise&#x27;,importance_type=None,interaction_constraints=None,lambda=5.356530752285997,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=2.088127882610971,missing=nan,monotone_constraints=None,n_estimators=652, n_jobs=None,num_parallel_tree=None, ...))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-4" type="checkbox" ><label for="sk-estimator-id-4" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.9163306027660207,early_stopping_rounds=None,enable_categorical=False,eta=0.08999645298052271,eval_metric=None,feature_types=None,gamma=None, gpu_...grow_policy=&#x27;depthwise&#x27;,importance_type=None,interaction_constraints=None,lambda=5.356530752285997,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=2.088127882610971,missing=nan,monotone_constraints=None,n_estimators=652, n_jobs=None,num_parallel_tree=None, ...))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-5" type="checkbox" ><label for="sk-estimator-id-5" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.9163306027660207, early_stopping_rounds=None,enable_categorical=False, eta=0.08999645298052271,eval_metric=None, feature_types=None, gamma=None, gpu_id=None,grow_policy=&#x27;depthwise&#x27;, importance_type=None,interaction_constraints=None, lambda=5.356530752285997,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=2.088127882610971, missing=nan,monotone_constraints=None, n_estimators=652, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-6" type="checkbox" ><label for="sk-estimator-id-6" class="sk-toggleable__label sk-toggleable__label-arrow">XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.9163306027660207, early_stopping_rounds=None,enable_categorical=False, eta=0.08999645298052271,eval_metric=None, feature_types=None, gamma=None, gpu_id=None,grow_policy=&#x27;depthwise&#x27;, importance_type=None,interaction_constraints=None, lambda=5.356530752285997,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=2.088127882610971, missing=nan,monotone_constraints=None, n_estimators=652, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, xgb_reg_2.predict(X_valid), squared=False)
0.14881444008796907

LIGHTGBM

def lgbm_objective(trial):# Define parameters to be optimized for the LGBMClassifierparam = {"verbosity": -1,"random_state": 42,"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.05),"n_estimators": trial.suggest_int("n_estimators", 400, 1000),"lambda_l1": trial.suggest_float("lambda_l1", 0.005, 0.015),"lambda_l2": trial.suggest_float("lambda_l2", 0.02, 0.06),"max_depth": trial.suggest_int("max_depth", 6, 14),"colsample_bytree": trial.suggest_float("colsample_bytree", 0.3, 0.9),"subsample": trial.suggest_float("subsample", 0.8, 1.0),"min_child_samples": trial.suggest_int("min_child_samples", 10, 70),"num_leaves": trial.suggest_int("num_leaves", 30, 100),"min_split_gain": trial.suggest_float("min_split_gain", 0.1, 1.0)}lgbm_reg = lightgbm.LGBMRegressor(**param)lgbm_reg.fit(X_train, y_train)score = mean_squared_log_error(y_valid, lgbm_reg.predict(X_valid), squared=False)return score# Set up the sampler for Optuna optimization
sampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization# Create a study object for Optuna optimization
lgbm_study = optuna.create_study(direction="minimize", sampler=sampler)
[I 2024-04-25 16:46:32,141] A new study created in memory with name: no-name-889f25f6-876c-4982-ba46-f71528b83793
if TUNE:# Run the optimization processlgbm_study.optimize(lambda trial: lgbm_objective(trial), n_trials=200)# Get the best parameters after optimizationlgbm_best_params = lgbm_study.best_paramsprint('='*50)print(lgbm_best_params)

LIGHTGbm 1

lgbm_params_1 = {'learning_rate': 0.04090453688322824,'n_estimators': 788,'reg_lambda': 29.248167932522765,'reg_alpha': 0.4583079398945705,'max_depth': 19,'colsample_bytree': 0.5439642175304692,'subsample': 0.8659762900446526,'min_child_samples': 12,'num_leaves': 69,'random_state': 42,'n_jobs': -1,'verbose': -1
}
lgbm_reg_1 = TransformedTargetRegressor(lightgbm.LGBMRegressor(**lgbm_params_1),func=np.log1p,inverse_func=np.expm1)
lgbm_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824,max_depth=19,min_child_samples=12,n_estimators=788, n_jobs=-1,num_leaves=69,random_state=42,reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765,subsample=0.8659762900446526,verbose=-1))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-7" type="checkbox" ><label for="sk-estimator-id-7" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824,max_depth=19,min_child_samples=12,n_estimators=788, n_jobs=-1,num_leaves=69,random_state=42,reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765,subsample=0.8659762900446526,verbose=-1))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-8" type="checkbox" ><label for="sk-estimator-id-8" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824, max_depth=19,min_child_samples=12, n_estimators=788, n_jobs=-1, num_leaves=69,random_state=42, reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765, subsample=0.8659762900446526,verbose=-1)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-9" type="checkbox" ><label for="sk-estimator-id-9" class="sk-toggleable__label sk-toggleable__label-arrow">LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824, max_depth=19,min_child_samples=12, n_estimators=788, n_jobs=-1, num_leaves=69,random_state=42, reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765, subsample=0.8659762900446526,verbose=-1)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, lgbm_reg_1.predict(X_valid), squared=False)
0.1477381305885499
feature_importance = lgbm_reg_1.regressor_.feature_importances_feature_names = X_train.columnssorted_indices = feature_importance.argsort()
sorted_importance = feature_importance[sorted_indices]
sorted_features = feature_names[sorted_indices]# Plot feature importance
plt.figure(figsize=(12, 8))
colors = plt.cm.Paired.colors[:len(sorted_features)]  
plt.barh(sorted_features, sorted_importance, color=colors)
plt.xlabel('Importance', fontsize=12)
plt.ylabel('Feature', fontsize=12)
plt.title('LightGBM Feature Importance', fontsize=14)
plt.gca().invert_yaxis() for i, v in enumerate(sorted_importance):plt.text(v + 0.02, i, f'{v:.2f}', color='black', va='center', fontsize=10)plt.tight_layout()  
plt.show()

在这里插入图片描述

LIGHTGbm 2

lgbm_params_2 = {'n_jobs': -1,'verbose': -1,'max_depth': 20,'num_leaves': 165,'subsample_freq': 1,'random_state': 42,'n_estimators': 1460,'min_child_samples': 25,'reg_lambda': 6.13475387151606,'subsample': 0.8036874216939632,'reg_alpha': 0.3152990674231573,'learning_rate': 0.009336479469693189,'colsample_bytree': 0.5780931837049811,'min_child_weight': 0.37333232256934057,
}
lgbm_reg_2 = TransformedTargetRegressor(lightgbm.LGBMRegressor(**lgbm_params_2),func=np.log1p,inverse_func=np.expm1)
lgbm_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189,max_depth=20,min_child_samples=25,min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1,num_leaves=165,random_state=42,reg_alpha=0.3152990674231573,reg_lambda=6.13475387151606,subsample=0.8036874216939632,subsample_freq=1,verbose=-1))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-10" type="checkbox" ><label for="sk-estimator-id-10" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189,max_depth=20,min_child_samples=25,min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1,num_leaves=165,random_state=42,reg_alpha=0.3152990674231573,reg_lambda=6.13475387151606,subsample=0.8036874216939632,subsample_freq=1,verbose=-1))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-11" type="checkbox" ><label for="sk-estimator-id-11" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189, max_depth=20,min_child_samples=25, min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1, num_leaves=165, random_state=42,reg_alpha=0.3152990674231573, reg_lambda=6.13475387151606,subsample=0.8036874216939632, subsample_freq=1, verbose=-1)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-12" type="checkbox" ><label for="sk-estimator-id-12" class="sk-toggleable__label sk-toggleable__label-arrow">LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189, max_depth=20,min_child_samples=25, min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1, num_leaves=165, random_state=42,reg_alpha=0.3152990674231573, reg_lambda=6.13475387151606,subsample=0.8036874216939632, subsample_freq=1, verbose=-1)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, lgbm_reg_2.predict(X_valid), squared=False)
0.14758741259851116

CatBoost

def cb_objective(trial):params = {"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.5),"max_depth": trial.suggest_int("depth", 4, 16),"l2_leaf_reg": trial.suggest_float("l2_leaf_reg", 1, 10),"n_estimators": trial.suggest_int("n_estimators", 100, 1500),"colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.5, 1.0),}cb_reg = TransformedTargetRegressor(catboost.CatBoostRegressor(**params, random_state=42, grow_policy='SymmetricTree',random_strength=0, cat_features=["Sex"], loss_function="RMSE"),func=np.log1p,inverse_func=np.expm1)cb_reg.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=False)val_scores = np.sqrt(mean_squared_log_error(y_valid, np.abs(cb_reg.predict(X_valid))))return val_scoressampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization
cb_study = optuna.create_study(direction = 'minimize',study_name="CBRegressor", sampler=sampler)
[I 2024-04-25 16:46:44,532] A new study created in memory with name: CBRegressor
if TUNE:cb_study.optimize(cb_objective, 30)

CatBoost 1

cb_params_1 = {'grow_policy': 'SymmetricTree', 'n_estimators': 1000, 'learning_rate': 0.128912681527133, 'l2_leaf_reg': 1.836927907521674, 'max_depth': 6, 'colsample_bylevel': 0.6775373040510968, 'random_strength': 0, 'boost_from_average': True, 'loss_function': 'RMSE', 'cat_features': ['Sex'], 'verbose': False}
cat_reg_1 = TransformedTargetRegressor(catboost.CatBoostRegressor(**cb_params_1),func=np.log1p,inverse_func=np.expm1)
cat_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-13" type="checkbox" ><label for="sk-estimator-id-13" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;)</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-14" type="checkbox" ><label for="sk-estimator-id-14" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-15" type="checkbox" ><label for="sk-estimator-id-15" class="sk-toggleable__label sk-toggleable__label-arrow">CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, cat_reg_1.predict(X_valid), squared=False)
0.14824841372252795

CatBoost 2

cb_params_2 = {'depth': 15, 'verbose': 0,'max_bin': 464, 'verbose': False,'random_state':42,'task_type': 'CPU', 'random_state': 42,'min_data_in_leaf': 78, 'loss_function': 'RMSE', 'grow_policy': 'Lossguide', 'bootstrap_type': 'Bernoulli', 'subsample': 0.83862137638162, 'l2_leaf_reg': 8.365422739510098, 'random_strength': 3.296124856352495, 'learning_rate': 0.09992185242598203,
}
cat_reg_2 = TransformedTargetRegressor(catboost.CatBoostRegressor(**cb_params_2),func=np.log1p,inverse_func=np.expm1)
cat_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-16" type="checkbox" ><label for="sk-estimator-id-16" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;)</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-17" type="checkbox" ><label for="sk-estimator-id-17" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-18" type="checkbox" ><label for="sk-estimator-id-18" class="sk-toggleable__label sk-toggleable__label-arrow">CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, cat_reg_2.predict(X_valid), squared=False)
0.14774083919007364

Ensembling the Results Using VotingRegressor 使用 VotingRegressor 组合结果

# weights = [0.025, 0.025, 0.275, 0.275, 0.05, 0.35]ensemble = VotingRegressor([
#         ("xgb_1", xgb_reg_1),
#         ("xgb_2", xgb_reg_2),("lgbm_1", lgbm_reg_1),("lgbm_2", lgbm_reg_2),("cb_1", cat_reg_1),("cb_2", cat_reg_2)]
)
ensemble.fit(X, y)

在这里插入图片描述

Submit the Output

pred = ensemble.predict(test.drop("id", axis=1))
submission = pd.DataFrame(test.id)
submission["Rings"] = pred
submission.to_csv("submission.csv", index=False)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/4708.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

GMSSL编译iOS

一、GMSSL-2.x 国密SDK源码下载&#xff0c;对GMSSL库进行编译生成对应的静态库。执行如下命令&#xff1a; cd到SDK源码目录 cd /Users/xxxx/Downloads/GMSSLV2-master查看SDK适用环境 ./config上图中错误解决方法 使用文本编辑器打开SDK目录下Configure、test/build.info、…

Android 学习 鸿蒙HarmonyOS 4.0 第二天(项目结构认识)

项目结构认识 和 了解&#xff1a; 工程目录下的结构如下&#xff1a; 首先能看到有两个.开头的文件&#xff0c;分别是.hvigor 和 .idea。这两个文件夹都是与构建有关系的&#xff0c; 如果你开发过安卓app&#xff0c;构建完会生成一个apk安装包&#xff0c;鸿蒙则是生成hap…

【Oracle】python调取oracle数据教程

目录 &#xff08;1&#xff09;安装python和相关库 1.python的下载和安装 2.python安装cx_Oracle库和pandas库 3.本机安装instantclient 数据库客户端 先安装instantclient 然后设置环境变量 &#xff08;2&#xff09;准备好连接Oracle数据库地址等五项信息 &#xf…

Linux(Centos 7)环境下安装wget,并且更换阿里云镜像

Linux(Centos 7) Minimal 安装后&#xff0c;由于没有预装wget&#xff0c;在使用wget命令去下载安装相关应用时&#xff0c;提示&#xff1a;“wget: command not found” 先在Linux服务器窗口中&#xff0c;输入如下命令&#xff0c;检查Linux服务器有没有安装过wget。 rpm -…

deepflow grafana plugin 编译问题解决

修改tsconfig.js 增加"noImplicitAny": false&#xff0c;解决代码类型没有指定&#xff0c;显示Any 错误 To solve the error, explicitly set the parameters type to any, use a more specific type or set noImplicitAny to false in tsconfig.json. https://b…

【大学生电子竞赛题目分析】——2023年H题《信号分离装置》

今年的大赛已临近落幕&#xff0c;笔者打算陆续对几个熟悉领域的题目作一番分析与讨论&#xff0c;今天首先分析H题。 网上有一些关于H题的分析&#xff0c;许多都是针对盲信号分析的。然而本题具有明确的信号频率范围&#xff0c;明确的信号可能频率&#xff0c;明确的信号波…

Jmeter Beanshell 设置全局变量

//获取token import com.alibaba.fastjson.JSONObject; import com.alibaba.fastjson.JSONArray; import java.util.*; import org.apache.jmeter.util.JMeterUtils; //获取可上机机器 String response prev.getResponseDataAsString(); JSONObject responseObect JSONObjec…

什么是跨域? 出现原因及解决方法

什么是跨域? 出现原因及解决方法 什么是跨域 跨域&#xff1a;浏览器对于javascript的同源策略的限制 。 同源政策的目的&#xff0c;是为了保证用户信息的安全&#xff0c;防止恶意的网站窃取数据。 设想这样一种情况&#xff1a;A 网站是一家银行&#xff0c;用户登录以后…

K8S哲学 - statefulSet 灰度发布

kubectl get - 获取资源及配置文件 kubectl get resource 【resourceName -oyaml】 kubectl create - 指定镜像创建或者 指定文件创建 kubectl create resource 【resourceName】 --imagemyImage 【-f my.yaml】 kubectl delete kubectl describe resource resourc…

怎么把试卷答案去掉再打印出来?

在学习中&#xff0c;试卷无疑是检验学习成果的重要工具。然而&#xff0c;当我们想重新练习这些试卷&#xff0c;加深对知识点的理解和记忆时&#xff0c;答案的存在往往会成为他们复习路上的“绊脚石”。那么&#xff0c;有没有一种方法可以轻松去除试卷上的答案&#xff0c;…

亚马逊云科技AWS将推出数据工程师全新认证(有资料)

AWS认证体系最近更新&#xff0c;在原有12张的基础上&#xff0c;将在2023年11月27日添加第13张&#xff0c;数据工程师助理级认证(Data Engineer Associate)&#xff0c;并且在2024/1/12前半价(省75刀&#xff1d;544人民币。 原有的数据分析专家级认证(Data Analytics Specia…

qt-C++笔记之滑动条QSlider和QProgressBar进度条

qt-C笔记之滑动条QSlider和QProgressBar进度条 —— 2024-04-28 杭州 本例来自《Qt6 C开发指南》 文章目录 qt-C笔记之滑动条QSlider和QProgressBar进度条1.运行2.阅读笔记3.文件结构4.samp4_06.pro5.main.cpp6.widget.h7.widget.cpp8.widget.ui 1.运行 2.阅读笔记 3.文件结构…

RuoYi-Vue-Plus (SPEL 表达式)

RuoYi-Vue-Plus 中SPEL使用 DataScopeType 枚举类中&#xff1a; /*** 部门数据权限*/DEPT("3", " #{#deptName} #{#user.deptId} ", " 1 0 "), PlusDataPermissionHandler 拦截器中定义了解析器&#xff1a; buildDataFilter 方法中根据注解的…

[LitCTF 2023]Ping、[SWPUCTF 2021 新生赛]error、[NSSCTF 2022 Spring Recruit]babyphp

[LitCTF 2023]Ping 尝试ping一下127.0.0.1成功了&#xff0c;但要查看根目录时提示只能输入IP 查看源代码&#xff0c;这段JavaScript代码定义了一个名为check_ip的函数&#xff0c;用于验证输入是否为有效的IPv4地址。并且使用正则表达式re来匹配IPv4地址的格式。 对于这种写…

机器学习:基于Sklearn、XGBoost框架,使用逻辑回归、支持向量机和XGBClassifier预测帕金森病

前言 系列专栏&#xff1a;机器学习&#xff1a;高级应用与实践【项目实战100】【2024】✨︎ 在本专栏中不仅包含一些适合初学者的最新机器学习项目&#xff0c;每个项目都处理一组不同的问题&#xff0c;包括监督和无监督学习、分类、回归和聚类&#xff0c;而且涉及创建深度学…

【已解决】Python Selenium chromedriver Pycharm闪退的问题

概要 根据不同的业务场景需求&#xff0c;有时我们难免会使用程序来打开浏览器进行访问。本文在pycharm中使用selenium打开chromedriver出现闪退问题&#xff0c;根据不断尝试&#xff0c;最终找到的问题根本是版本问题。 代码如下 # (1) 导入selenium from selenium import …

科研学习|论文解读——CVPR 2021 人脸造假检测(论文合集)

随着图像合成技术的成熟&#xff0c;利用一张人脸照片合成假视频/不良视频现象越来越多&#xff0c;严重侵犯个人隐私、妨碍司法公正&#xff0c;所以人脸造假检测越来越重要&#xff0c;学术界的论文也越来越多。 一、研究1 1.1 论文题目 Multi-attentional Deepfake Detecti…

自学Python爬虫js逆向(二)chrome浏览器开发者工具的使用

js逆向中很多工作需要使用浏览器中的开发者工具&#xff0c;所以这里以chrome为例&#xff0c;先把开发者工具的使用总结一下&#xff0c;后面用到的时候可以回来查询。 Google Chrome浏览器的开发者工具是前端开发者的利器&#xff0c;它不仅提供了丰富的功能用于开发、调试和…

什么是中间件?中间件有哪些?

什么是中间件&#xff1f; 中间件&#xff08;Middleware&#xff09;是指在客户端和服务器之间的一层软件组件&#xff0c;用于处理请求和响应的过程。 中间件是指介于两个不同系统之间的软件组件&#xff0c;它可以在两个系统之间传递、处理、转换数据&#xff0c;以达到协…

排序算法(2)快排

交换排序 思想&#xff1a;所谓交换&#xff0c;就是根据序列中两个记录键值的比较结果来对换这两个记录在序列中的位置&#xff0c;交换排序的特点是&#xff1a;将键值较大的记录向序列的尾部移动&#xff0c;键值较小的记录向序列的前部移动。 一、冒泡排序 public static…