kaggle(4) Regression with an Abalone Dataset 鲍鱼数据集的回归

kaggle(4) Regression with an Abalone Dataset 鲍鱼数据集的回归

在这里插入图片描述

import pandas as pd
import numpy as npimport xgboost
import lightgbm
import optuna
import catboostfrom sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_log_error
from sklearn.compose import TransformedTargetRegressor
from sklearn.ensemble import VotingRegressor, StackingRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoderimport seaborn as sns
import matplotlib.pyplot as pltimport warnings
warnings.filterwarnings("ignore")

PROJECT DESCRIPTION

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope

PHYSICAL ATTRIBUTES

SEX: Male/Female/Infant
LENGTH: Longest shell measurement
DIAMETER: Diameter of the Abalone
HEIGHT: Height of the Abalone
WHOLE WEIGHT: Weight of the whole abalone
SHUCKED WEIGHT: Weight of the meat
VISCERA WEIGHT: Gut Weight - Interal Organs
SHELL WEIGHT: Shell Weight after drying
RINGS: Number of rings +1.5 gives Age of the Abalone

项目介绍

通过物理测量预测鲍鱼的年龄。 鲍鱼的年龄是通过将鲍鱼壳从锥体上切开、染色并通过显微镜计算环数来确定的。

物理属性

性别:男/女/婴儿
长度:最长外壳测量值
直径:鲍鱼的直径
高度:鲍鱼的高度
整体重量:整个鲍鱼的重量
去壳重量:肉的重量
内脏重量:肠道重量 - 内脏器官
壳重:干燥后的壳重
:环数+1.5给出鲍鱼的年龄

Load the Datasets 加载数据集

# original = pd.read_csv("/kaggle/input/abalone-dataset/abalone.csv")
# train = pd.read_csv("/kaggle/input/playground-series-s4e4/train.csv")
# test = pd.read_csv("/kaggle/input/playground-series-s4e4/test.csv")
original = pd.read_csv("./data/abalone.csv")
train = pd.read_csv("./data/train.csv")
test = pd.read_csv("./data/test.csv")

Make the data ready for tuning 准备好数据进行调整

train = train.drop("id", axis=1)
train=train.rename(columns={'Whole weight':'Whole weight','Whole weight.1':'Shucked weight', 'Whole weight.2':'Viscera weight', 'Shell weight':'Shell weight'})
test=test.rename(columns={'Whole weight':'Whole weight','Whole weight.1':'Shucked weight', 'Whole weight.2':'Viscera weight', 'Shell weight':'Shell weight'})
train = pd.concat([train, original], axis=0)

Get familier with the Data 熟悉数据

train.head()
SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
0F0.5500.4300.1500.77150.32850.14650.240011
1F0.6300.4900.1451.13000.45800.27650.320011
2I0.1600.1100.0250.02100.00550.00300.00506
3M0.5950.4750.1500.91450.37550.20550.250010
4I0.5550.4250.1300.78200.36950.16000.19759
print(f"The shape of training dataset is : {train.shape}")
print(f"The shape of testing dataset is : {test.shape}")
The shape of training dataset is : (94792, 9)
The shape of testing dataset is : (60411, 9)
test.head()
idSexLengthDiameterHeightWhole weightShucked weightViscera weightShell weight
090615M0.6450.4750.1551.23800.61850.31250.3005
190616M0.5800.4600.1600.98300.47850.21950.2750
290617M0.5600.4200.1400.83950.35250.18450.2405
390618M0.5700.4900.1450.87400.35250.18650.2350
490619I0.4150.3250.1100.35800.15750.06700.1050
train.groupby("Sex").count()["Length"]
Sex
F    27802
I    34435
M    32555
Name: Length, dtype: int64
test.groupby("Sex").count()["Length"]
Sex
F    17387
I    22241
M    20783
Name: Length, dtype: int64
np.sort(pd.unique(train.Rings))
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29], dtype=int64)

View the Distribution 查看分布

train.hist(figsize=(12, 10), grid=True, bins=50)
plt.tight_layout()
plt.axis("off")
(0.0, 1.0, 0.0, 1.0)

在这里插入图片描述

test.hist(figsize=(12, 10), grid=True, bins=50)
plt.tight_layout()
plt.axis("off")
(0.0, 1.0, 0.0, 1.0)

在这里插入图片描述

CONTINUOUS COLUMN ANALYSIS 连续柱分析

# Set up warnings to be ignored (optional)
warnings.filterwarnings("ignore")
pd.set_option('mode.use_inf_as_na', False)train_str = train
train_str['Rings'] = train_str['Rings'].astype(str)# List of continuous variables in your dataset
continuous_vars = ['Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight']# Set hue to your target column
target_column = 'Rings'for column in continuous_vars:fig, axes = plt.subplots(1, 2, figsize=(18, 4))  # Create subplots with 1 row and 2 columns# Plot histogram with hue and explicit labelssns.histplot(data=train_str, x=column, hue=target_column, bins=50, kde=True, ax=axes[0], palette='muted', legend=False)axes[0].set_title(f'Histogram of {column} with {target_column} Hue')axes[0].set_xlabel(column)axes[0].set_ylabel('Count')axes[0].legend(title=target_column, loc='upper right')# Plot KDE plot with hue and explicit labelssns.kdeplot(data=train_str, x=column, hue=target_column, ax=axes[1], palette='muted', legend=False)axes[1].set_title(f'KDE Plot of {column} with {target_column} Hue')axes[1].set_xlabel(column)axes[1].set_ylabel('Density')axes[1].legend(title=target_column, loc='upper right')plt.tight_layout()  # Adjust spacing between subplotsplt.show()

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

ANALYSIS BY QQ PLOT QQ图分析

import scipy.stats as stats  
def qq_plot_with_skewness(data, quantitative_var):# Check if the variable is present in the DataFrameif quantitative_var not in data.columns:print(f"Error: '{quantitative_var}' not found in the DataFrame.")returnf, ax = plt.subplots(1, 2, figsize=(18, 5.5))# Check for missing valuesif data[quantitative_var].isnull().any():print(f"Warning: '{quantitative_var}' contains missing values. Results may be affected.")# QQ plotstats.probplot(data[quantitative_var], plot=ax[0], fit=True)ax[0].set_title(f'QQ Plot for {quantitative_var}')# Skewness plotsns.histplot(data[quantitative_var], kde=True, ax=ax[1])ax[1].set_title(f'Distribution of {quantitative_var}')# Calculate skewness valueskewness_value = stats.skew(data[quantitative_var])# Display skewness value on the plotax[1].text(0.5, 0.5, f'Skewness: {skewness_value:.2f}', transform=ax[1].transAxes, horizontalalignment='center', verticalalignment='center', fontsize=16, color='red')plt.show()
# Example usage for each continuous variable
for var in continuous_vars:qq_plot_with_skewness(train, var)

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

Split the Dataset 分割数据集

sex_to_num = {"M": 0,"F": 1,"I": 2
}
train["Sex"] = train["Sex"].replace(sex_to_num.keys(), [sex_to_num[key] for key in sex_to_num])
test["Sex"] = test["Sex"].replace(sex_to_num.keys(), [sex_to_num[key] for key in sex_to_num])
train.groupby("Sex").count()["Length"]
Sex
0    32555
1    27802
2    34435
Name: Length, dtype: int64
X = train.drop("Rings", axis=1)
y = train.Rings
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42, stratify=y_test)

Here stratify parameter keeps the ratio of Rings same all across the Dtaset

XGBoost

we will be implementing two XGBoost models

def xgb_objective(trial):params = {"eta": trial.suggest_float("eta", 0.01, 1.0),"gamma": 0.0,"max_depth": trial.suggest_int("max_depth", 3, 20),"min_child_weight": trial.suggest_float("min_child_weight", 1., 50.),"subsample": trial.suggest_float("subsample", 0.5, 1.0),"colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),"reg_lambda": trial.suggest_float("lambda", 1.0, 100.0),"n_estimators": trial.suggest_int("n_estimators", 100, 1000)}xgb_reg = TransformedTargetRegressor(xgboost.XGBRegressor(**params, objective='reg:squarederror', grow_policy='lossguide',tree_method="hist", random_state=42),func=np.log1p,inverse_func=np.expm1)xgb_reg.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=False)val_scores = mean_squared_log_error(y_valid, xgb_reg.predict(X_valid), squared=False)return val_scoressampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization
xgb_study = optuna.create_study(direction = 'minimize',study_name="XgbRegressor", sampler=sampler)
[I 2024-04-25 16:46:29,229] A new study created in memory with name: XgbRegressor

XGBoost 1st model

TUNE = False
if TUNE:xgb_study.optimize(xgb_objective, n_trials=500)

Set TUNE parameter to True incase you want to run Hyper Parameter Tuning

xgb_best_params_1 = {'eta': 0.1006321838798394,'max_depth': 6,'min_child_weight': 27.999752791085136,'subsample': 0.7344797943645852,'colsample_bytree': 0.5389765810810496,'lambda': 79.62358968148187,'n_estimators': 407
}
xgb_reg_1 = TransformedTargetRegressor(xgboost.XGBRegressor(**xgb_best_params_1, objective='reg:squarederror', grow_policy='lossguide',tree_method="hist", random_state=42, gamma=0.0),func=np.log1p,inverse_func=np.expm1)
xgb_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.5389765810810496,early_stopping_rounds=None,enable_categorical=False,eta=0.1006321838798394,eval_metric=None,feature_types=None, gamma=0.0,gpu_id...grow_policy=&#x27;lossguide&#x27;,importance_type=None,interaction_constraints=None,lambda=79.62358968148187,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=27.999752791085136,missing=nan,monotone_constraints=None,n_estimators=407, n_jobs=None,num_parallel_tree=None, ...))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" ><label for="sk-estimator-id-1" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.5389765810810496,early_stopping_rounds=None,enable_categorical=False,eta=0.1006321838798394,eval_metric=None,feature_types=None, gamma=0.0,gpu_id...grow_policy=&#x27;lossguide&#x27;,importance_type=None,interaction_constraints=None,lambda=79.62358968148187,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=27.999752791085136,missing=nan,monotone_constraints=None,n_estimators=407, n_jobs=None,num_parallel_tree=None, ...))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-2" type="checkbox" ><label for="sk-estimator-id-2" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.5389765810810496, early_stopping_rounds=None,enable_categorical=False, eta=0.1006321838798394, eval_metric=None,feature_types=None, gamma=0.0, gpu_id=None,grow_policy=&#x27;lossguide&#x27;, importance_type=None,interaction_constraints=None, lambda=79.62358968148187,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=27.999752791085136, missing=nan,monotone_constraints=None, n_estimators=407, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-3" type="checkbox" ><label for="sk-estimator-id-3" class="sk-toggleable__label sk-toggleable__label-arrow">XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.5389765810810496, early_stopping_rounds=None,enable_categorical=False, eta=0.1006321838798394, eval_metric=None,feature_types=None, gamma=0.0, gpu_id=None,grow_policy=&#x27;lossguide&#x27;, importance_type=None,interaction_constraints=None, lambda=79.62358968148187,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=27.999752791085136, missing=nan,monotone_constraints=None, n_estimators=407, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, xgb_reg_1.predict(X_valid), squared=False)
0.1484868328972631
feature_importance = xgb_reg_1.regressor_.feature_importances_
feature_names = X_train.columnssorted_indices = feature_importance.argsort()
sorted_importance = feature_importance[sorted_indices]
sorted_features = feature_names[sorted_indices]plt.figure(figsize=(10, 6))
colors = plt.cm.tab20c.colors[:len(sorted_features)]  
plt.barh(sorted_features, sorted_importance, color=colors)
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('XGBoost Feature Importance')
plt.gca().invert_yaxis() 
plt.tight_layout()  
plt.show()

在这里插入图片描述

XGBoost 2

xgb_best_params_2 = {'eta': 0.08999645298052271,'max_depth': 6,'min_child_weight': 2.088127882610971,'subsample': 0.7725806961689413,'colsample_bytree': 0.9163306027660207,'lambda': 5.356530752285997,'n_estimators': 652
}
xgb_reg_2 = TransformedTargetRegressor(xgboost.XGBRegressor(**xgb_best_params_2, objective='reg:squaredlogerror', grow_policy='depthwise',tree_method="hist", random_state=42),func=np.log1p,inverse_func=np.expm1)
xgb_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.9163306027660207,early_stopping_rounds=None,enable_categorical=False,eta=0.08999645298052271,eval_metric=None,feature_types=None,gamma=None, gpu_...grow_policy=&#x27;depthwise&#x27;,importance_type=None,interaction_constraints=None,lambda=5.356530752285997,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=2.088127882610971,missing=nan,monotone_constraints=None,n_estimators=652, n_jobs=None,num_parallel_tree=None, ...))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-4" type="checkbox" ><label for="sk-estimator-id-4" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.9163306027660207,early_stopping_rounds=None,enable_categorical=False,eta=0.08999645298052271,eval_metric=None,feature_types=None,gamma=None, gpu_...grow_policy=&#x27;depthwise&#x27;,importance_type=None,interaction_constraints=None,lambda=5.356530752285997,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=2.088127882610971,missing=nan,monotone_constraints=None,n_estimators=652, n_jobs=None,num_parallel_tree=None, ...))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-5" type="checkbox" ><label for="sk-estimator-id-5" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.9163306027660207, early_stopping_rounds=None,enable_categorical=False, eta=0.08999645298052271,eval_metric=None, feature_types=None, gamma=None, gpu_id=None,grow_policy=&#x27;depthwise&#x27;, importance_type=None,interaction_constraints=None, lambda=5.356530752285997,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=2.088127882610971, missing=nan,monotone_constraints=None, n_estimators=652, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-6" type="checkbox" ><label for="sk-estimator-id-6" class="sk-toggleable__label sk-toggleable__label-arrow">XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.9163306027660207, early_stopping_rounds=None,enable_categorical=False, eta=0.08999645298052271,eval_metric=None, feature_types=None, gamma=None, gpu_id=None,grow_policy=&#x27;depthwise&#x27;, importance_type=None,interaction_constraints=None, lambda=5.356530752285997,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=2.088127882610971, missing=nan,monotone_constraints=None, n_estimators=652, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, xgb_reg_2.predict(X_valid), squared=False)
0.14881444008796907

LIGHTGBM

def lgbm_objective(trial):# Define parameters to be optimized for the LGBMClassifierparam = {"verbosity": -1,"random_state": 42,"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.05),"n_estimators": trial.suggest_int("n_estimators", 400, 1000),"lambda_l1": trial.suggest_float("lambda_l1", 0.005, 0.015),"lambda_l2": trial.suggest_float("lambda_l2", 0.02, 0.06),"max_depth": trial.suggest_int("max_depth", 6, 14),"colsample_bytree": trial.suggest_float("colsample_bytree", 0.3, 0.9),"subsample": trial.suggest_float("subsample", 0.8, 1.0),"min_child_samples": trial.suggest_int("min_child_samples", 10, 70),"num_leaves": trial.suggest_int("num_leaves", 30, 100),"min_split_gain": trial.suggest_float("min_split_gain", 0.1, 1.0)}lgbm_reg = lightgbm.LGBMRegressor(**param)lgbm_reg.fit(X_train, y_train)score = mean_squared_log_error(y_valid, lgbm_reg.predict(X_valid), squared=False)return score# Set up the sampler for Optuna optimization
sampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization# Create a study object for Optuna optimization
lgbm_study = optuna.create_study(direction="minimize", sampler=sampler)
[I 2024-04-25 16:46:32,141] A new study created in memory with name: no-name-889f25f6-876c-4982-ba46-f71528b83793
if TUNE:# Run the optimization processlgbm_study.optimize(lambda trial: lgbm_objective(trial), n_trials=200)# Get the best parameters after optimizationlgbm_best_params = lgbm_study.best_paramsprint('='*50)print(lgbm_best_params)

LIGHTGbm 1

lgbm_params_1 = {'learning_rate': 0.04090453688322824,'n_estimators': 788,'reg_lambda': 29.248167932522765,'reg_alpha': 0.4583079398945705,'max_depth': 19,'colsample_bytree': 0.5439642175304692,'subsample': 0.8659762900446526,'min_child_samples': 12,'num_leaves': 69,'random_state': 42,'n_jobs': -1,'verbose': -1
}
lgbm_reg_1 = TransformedTargetRegressor(lightgbm.LGBMRegressor(**lgbm_params_1),func=np.log1p,inverse_func=np.expm1)
lgbm_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824,max_depth=19,min_child_samples=12,n_estimators=788, n_jobs=-1,num_leaves=69,random_state=42,reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765,subsample=0.8659762900446526,verbose=-1))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-7" type="checkbox" ><label for="sk-estimator-id-7" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824,max_depth=19,min_child_samples=12,n_estimators=788, n_jobs=-1,num_leaves=69,random_state=42,reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765,subsample=0.8659762900446526,verbose=-1))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-8" type="checkbox" ><label for="sk-estimator-id-8" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824, max_depth=19,min_child_samples=12, n_estimators=788, n_jobs=-1, num_leaves=69,random_state=42, reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765, subsample=0.8659762900446526,verbose=-1)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-9" type="checkbox" ><label for="sk-estimator-id-9" class="sk-toggleable__label sk-toggleable__label-arrow">LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824, max_depth=19,min_child_samples=12, n_estimators=788, n_jobs=-1, num_leaves=69,random_state=42, reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765, subsample=0.8659762900446526,verbose=-1)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, lgbm_reg_1.predict(X_valid), squared=False)
0.1477381305885499
feature_importance = lgbm_reg_1.regressor_.feature_importances_feature_names = X_train.columnssorted_indices = feature_importance.argsort()
sorted_importance = feature_importance[sorted_indices]
sorted_features = feature_names[sorted_indices]# Plot feature importance
plt.figure(figsize=(12, 8))
colors = plt.cm.Paired.colors[:len(sorted_features)]  
plt.barh(sorted_features, sorted_importance, color=colors)
plt.xlabel('Importance', fontsize=12)
plt.ylabel('Feature', fontsize=12)
plt.title('LightGBM Feature Importance', fontsize=14)
plt.gca().invert_yaxis() for i, v in enumerate(sorted_importance):plt.text(v + 0.02, i, f'{v:.2f}', color='black', va='center', fontsize=10)plt.tight_layout()  
plt.show()

在这里插入图片描述

LIGHTGbm 2

lgbm_params_2 = {'n_jobs': -1,'verbose': -1,'max_depth': 20,'num_leaves': 165,'subsample_freq': 1,'random_state': 42,'n_estimators': 1460,'min_child_samples': 25,'reg_lambda': 6.13475387151606,'subsample': 0.8036874216939632,'reg_alpha': 0.3152990674231573,'learning_rate': 0.009336479469693189,'colsample_bytree': 0.5780931837049811,'min_child_weight': 0.37333232256934057,
}
lgbm_reg_2 = TransformedTargetRegressor(lightgbm.LGBMRegressor(**lgbm_params_2),func=np.log1p,inverse_func=np.expm1)
lgbm_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189,max_depth=20,min_child_samples=25,min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1,num_leaves=165,random_state=42,reg_alpha=0.3152990674231573,reg_lambda=6.13475387151606,subsample=0.8036874216939632,subsample_freq=1,verbose=-1))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-10" type="checkbox" ><label for="sk-estimator-id-10" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189,max_depth=20,min_child_samples=25,min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1,num_leaves=165,random_state=42,reg_alpha=0.3152990674231573,reg_lambda=6.13475387151606,subsample=0.8036874216939632,subsample_freq=1,verbose=-1))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-11" type="checkbox" ><label for="sk-estimator-id-11" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189, max_depth=20,min_child_samples=25, min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1, num_leaves=165, random_state=42,reg_alpha=0.3152990674231573, reg_lambda=6.13475387151606,subsample=0.8036874216939632, subsample_freq=1, verbose=-1)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-12" type="checkbox" ><label for="sk-estimator-id-12" class="sk-toggleable__label sk-toggleable__label-arrow">LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189, max_depth=20,min_child_samples=25, min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1, num_leaves=165, random_state=42,reg_alpha=0.3152990674231573, reg_lambda=6.13475387151606,subsample=0.8036874216939632, subsample_freq=1, verbose=-1)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, lgbm_reg_2.predict(X_valid), squared=False)
0.14758741259851116

CatBoost

def cb_objective(trial):params = {"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.5),"max_depth": trial.suggest_int("depth", 4, 16),"l2_leaf_reg": trial.suggest_float("l2_leaf_reg", 1, 10),"n_estimators": trial.suggest_int("n_estimators", 100, 1500),"colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.5, 1.0),}cb_reg = TransformedTargetRegressor(catboost.CatBoostRegressor(**params, random_state=42, grow_policy='SymmetricTree',random_strength=0, cat_features=["Sex"], loss_function="RMSE"),func=np.log1p,inverse_func=np.expm1)cb_reg.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=False)val_scores = np.sqrt(mean_squared_log_error(y_valid, np.abs(cb_reg.predict(X_valid))))return val_scoressampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization
cb_study = optuna.create_study(direction = 'minimize',study_name="CBRegressor", sampler=sampler)
[I 2024-04-25 16:46:44,532] A new study created in memory with name: CBRegressor
if TUNE:cb_study.optimize(cb_objective, 30)

CatBoost 1

cb_params_1 = {'grow_policy': 'SymmetricTree', 'n_estimators': 1000, 'learning_rate': 0.128912681527133, 'l2_leaf_reg': 1.836927907521674, 'max_depth': 6, 'colsample_bylevel': 0.6775373040510968, 'random_strength': 0, 'boost_from_average': True, 'loss_function': 'RMSE', 'cat_features': ['Sex'], 'verbose': False}
cat_reg_1 = TransformedTargetRegressor(catboost.CatBoostRegressor(**cb_params_1),func=np.log1p,inverse_func=np.expm1)
cat_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-13" type="checkbox" ><label for="sk-estimator-id-13" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;)</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-14" type="checkbox" ><label for="sk-estimator-id-14" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-15" type="checkbox" ><label for="sk-estimator-id-15" class="sk-toggleable__label sk-toggleable__label-arrow">CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, cat_reg_1.predict(X_valid), squared=False)
0.14824841372252795

CatBoost 2

cb_params_2 = {'depth': 15, 'verbose': 0,'max_bin': 464, 'verbose': False,'random_state':42,'task_type': 'CPU', 'random_state': 42,'min_data_in_leaf': 78, 'loss_function': 'RMSE', 'grow_policy': 'Lossguide', 'bootstrap_type': 'Bernoulli', 'subsample': 0.83862137638162, 'l2_leaf_reg': 8.365422739510098, 'random_strength': 3.296124856352495, 'learning_rate': 0.09992185242598203,
}
cat_reg_2 = TransformedTargetRegressor(catboost.CatBoostRegressor(**cb_params_2),func=np.log1p,inverse_func=np.expm1)
cat_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-16" type="checkbox" ><label for="sk-estimator-id-16" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;)</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-17" type="checkbox" ><label for="sk-estimator-id-17" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-18" type="checkbox" ><label for="sk-estimator-id-18" class="sk-toggleable__label sk-toggleable__label-arrow">CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, cat_reg_2.predict(X_valid), squared=False)
0.14774083919007364

Ensembling the Results Using VotingRegressor 使用 VotingRegressor 组合结果

# weights = [0.025, 0.025, 0.275, 0.275, 0.05, 0.35]ensemble = VotingRegressor([
#         ("xgb_1", xgb_reg_1),
#         ("xgb_2", xgb_reg_2),("lgbm_1", lgbm_reg_1),("lgbm_2", lgbm_reg_2),("cb_1", cat_reg_1),("cb_2", cat_reg_2)]
)
ensemble.fit(X, y)

在这里插入图片描述

Submit the Output

pred = ensemble.predict(test.drop("id", axis=1))
submission = pd.DataFrame(test.id)
submission["Rings"] = pred
submission.to_csv("submission.csv", index=False)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/4708.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

GMSSL编译iOS

一、GMSSL-2.x 国密SDK源码下载&#xff0c;对GMSSL库进行编译生成对应的静态库。执行如下命令&#xff1a; cd到SDK源码目录 cd /Users/xxxx/Downloads/GMSSLV2-master查看SDK适用环境 ./config上图中错误解决方法 使用文本编辑器打开SDK目录下Configure、test/build.info、…

Vue.js(过渡)

1.过渡 Vue 在插入、更新或者移除 DOM 时&#xff0c;提供多种不同方式的应用过渡效果。 Vue 提供了内置的过渡封装组件&#xff0c;该组件用于包裹要实现过渡效果的组件。 语法格式 <transition name "nameoftransition"><div></div> </tr…

Android 学习 鸿蒙HarmonyOS 4.0 第二天(项目结构认识)

项目结构认识 和 了解&#xff1a; 工程目录下的结构如下&#xff1a; 首先能看到有两个.开头的文件&#xff0c;分别是.hvigor 和 .idea。这两个文件夹都是与构建有关系的&#xff0c; 如果你开发过安卓app&#xff0c;构建完会生成一个apk安装包&#xff0c;鸿蒙则是生成hap…

【C++风云录】进入语音识别与自然语言处理的世界:探索C++库的功能与应用场景

构建智能语音应用&#xff1a;深入了解C语音识别与自然语言处理库 前言 语音识别和自然语言处理是人工智能领域的重要研究方向&#xff0c;它们在自动语音识别、机器翻译、智能对话等方面有着广泛的应用。在这个领域&#xff0c;有许多优秀的开源和商业的工具和库可供选择&am…

mxnet.gluon.rnn及mxnet.symbol实现LSTM教程

基于mxnet.symbol的基本使用以及模型加载与保存 mxnet基本使用以及模型加载与保存 mxnet.symbolRNN-GRU-LSTM-Bi官网教程 基于mxnet的LSTM实现(mx.rnn.LSTMCellsymbol) 基于mxnet.gluon.rnn的基本使用以及模型加载与保存 LSTM Mxnet Implementation-手写 mxnet.gluon.rnn.LSTM中…

K8s: Helm包管理工具的应用以及项目分环境部署

Helm 概述与安装 1 ) 概述 k8s中官方包管理工具, 官网: https://helm.sh/用 Yaml 管理多个应用同时部署 不需要在不同的yaml中写两遍&#xff0c;执行两遍解决一键部署的问题&#xff0c;联合部署 实现了部署的版本管理 可以实现版本回滚 应用和配置分离 2 &#xff09;安装 …

阿里云RocketMQ消费MQTT消息

业务背景&#xff1a; 项目中涉及的消息队列既有RocketMQ&#xff0c;又有MQTT&#xff0c;均为阿里云提供&#xff08;阿里云有专门的“微消息队列 MQTT 版”模块&#xff0c;但博主公司消息队列的实例都在“消息队列 RocketMQ 版”模块下&#xff0c;只是实例不同&#xff0c…

【Oracle】python调取oracle数据教程

目录 &#xff08;1&#xff09;安装python和相关库 1.python的下载和安装 2.python安装cx_Oracle库和pandas库 3.本机安装instantclient 数据库客户端 先安装instantclient 然后设置环境变量 &#xff08;2&#xff09;准备好连接Oracle数据库地址等五项信息 &#xf…

Java基础(3)String、StringBuffer、StringBuilder

在Java中&#xff0c;字符串处理是日常开发的重要组成部分。主要有三种类型的类用于创建和操作字符串&#xff1a;String、StringBuffer和StringBuilder。虽然这三个类都能够处理字符串&#xff0c;但它们在功能和性能方面存在显著差异。 String String是不可变的&#xff08…

VUE3与Uniapp 五 (v-if、v-show和template的使用)

<template><!-- v-if如果是false&#xff0c;则不会出现在DOM中&#xff0c;不会被渲染&#xff1b;v-show如果为false&#xff0c;则会出现在DOM中&#xff0c;并加载资源&#xff08;如图片&#xff09;&#xff0c;只是CSS隐藏了。 --><view v-if"day1&…

秋招后端开发面试题 - Java多线程(上)

目录 Java多线程前言面试题线程和进程&#xff1f;说说线程有几种创建方式&#xff1f;为什么调用 start() 方法时会执行 run() 方法&#xff0c;那怎么不直接调用 run() 方法&#xff1f;线程有哪些常用的调度方法&#xff1f;线程有几种状态&#xff1f;守护线程了解吗&#…

深入理解汇编中的ZF、OF、SF标志位和条件跳转

本节课在线学习视频&#xff1a;https://pan.quark.cn/s/bbc4781e5336 汇编语言中的程序控制流常依赖于处理器的状态标志来进行决策。在x86架构中&#xff0c;ZF&#xff08;Zero Flag&#xff09;、OF&#xff08;Overflow Flag&#xff09;和SF&#xff08;Sign Flag&#x…

Linux(Centos 7)环境下安装wget,并且更换阿里云镜像

Linux(Centos 7) Minimal 安装后&#xff0c;由于没有预装wget&#xff0c;在使用wget命令去下载安装相关应用时&#xff0c;提示&#xff1a;“wget: command not found” 先在Linux服务器窗口中&#xff0c;输入如下命令&#xff0c;检查Linux服务器有没有安装过wget。 rpm -…

Django信号(Signals)使用案例:自动化工作流程

Django信号&#xff08;Signals&#xff09;是一种可以让应用程序组件之间进行解耦的机制。它允许在特定事件发生时发送信号&#xff0c;其他组件可以监听这些信号并做出相应的处理。 在自动化工作流程中&#xff0c;Django信号可以用来触发自动化任务或流程。以下是一个使用D…

deepflow grafana plugin 编译问题解决

修改tsconfig.js 增加"noImplicitAny": false&#xff0c;解决代码类型没有指定&#xff0c;显示Any 错误 To solve the error, explicitly set the parameters type to any, use a more specific type or set noImplicitAny to false in tsconfig.json. https://b…

【大学生电子竞赛题目分析】——2023年H题《信号分离装置》

今年的大赛已临近落幕&#xff0c;笔者打算陆续对几个熟悉领域的题目作一番分析与讨论&#xff0c;今天首先分析H题。 网上有一些关于H题的分析&#xff0c;许多都是针对盲信号分析的。然而本题具有明确的信号频率范围&#xff0c;明确的信号可能频率&#xff0c;明确的信号波…

Jmeter Beanshell 设置全局变量

//获取token import com.alibaba.fastjson.JSONObject; import com.alibaba.fastjson.JSONArray; import java.util.*; import org.apache.jmeter.util.JMeterUtils; //获取可上机机器 String response prev.getResponseDataAsString(); JSONObject responseObect JSONObjec…

什么是跨域? 出现原因及解决方法

什么是跨域? 出现原因及解决方法 什么是跨域 跨域&#xff1a;浏览器对于javascript的同源策略的限制 。 同源政策的目的&#xff0c;是为了保证用户信息的安全&#xff0c;防止恶意的网站窃取数据。 设想这样一种情况&#xff1a;A 网站是一家银行&#xff0c;用户登录以后…

K8S哲学 - statefulSet 灰度发布

kubectl get - 获取资源及配置文件 kubectl get resource 【resourceName -oyaml】 kubectl create - 指定镜像创建或者 指定文件创建 kubectl create resource 【resourceName】 --imagemyImage 【-f my.yaml】 kubectl delete kubectl describe resource resourc…

OceanBase 分布式数据库【信创/国产化】- 登录 OceanBase 租户

本心、输入输出、结果 文章目录 OceanBase 分布式数据库【信创/国产化】- 登录 OceanBase 租户前言OceanBase 数据更新架构OceanBase 租户架构登录系统租户通过 MySQL 客户端登录通过 OBClient 登录登录最佳实践登录用户租户登录 Meta 租户OceanBase 分布式数据库【信创/国产化…