2024 1st Place Solution
Overview
最终模型(CV/Private LB为5.8117/5.4030)是CatBoost(5.8240/5.4165)、GRU(5.8481/5.4259)和Transformer(5.8619/5.4296)的组合,权重分别为0.5、0.3、0.2,从验证集中搜索得到。这些模型共享相同的300个特征。
此外,在线学习(Online Learning)和后处理(Post-Processing)在我的最终提交中也起着重要作用。
My final model(CV/Private LB of 5.8117/5.4030) was a combination of CatBoost (5.8240/5.4165), GRU (5.8481/5.4259), and Transformer (5.8619/5.4296), with respective weights of 0.5, 0.3, 0.2 searched from validation set. And these models share same 300 features.
Besides, online learning(OL) and post-processing(PP) also play an important role in my final submission.
Validation Strategy
我的验证策略很简单,在前400天进行训练,选择最后81天作为我的保留验证集。CV得分与排行榜得分非常吻合,这让我相信这场比赛不会动摇太多。所以我在大部分时间里都专注于提高CV。
My validation strategy is pretty simple, train on first 400 days and
choose last 81 days as my holdout validation set. The CV score aligns
with leaderboard score very well which makes me believe that this
competition wouldn't shake too much. So I just focus on improving CV in
most of time.
Magic Features
我的模型最终有300个特征。其中大多数是常用的,如原始价格、中间价格、不平衡特征、滚动特征和历史目标特征。
我将介绍一些非常有用的功能,以及其他团队尚未分享的功能。
1个基于seconds_in_bucket_group的聚集特征
My models have 300 features in the end. Most of th