广义估计方程估计方法_广义估计方程简介

广义估计方程估计方法

A key assumption underpinning generalized linear models (which linear regression is a type of) is the independence of observations. In longitudinal data this will simply not hold. Observations within an individual (between time points) are likely to be more similar than those between individuals.

支持广义线性模型 (线性回归是一种类型)的关键假设是观测的独立性。 在纵向数据中,这根本不成立。 个人内部(时间点之间)的观察可能比个人之间的观察更相似。

So, how do you deal with this? One option is to fit a generalized linear mixed model in which there are random intercept and slope terms for each individual. This will tell you for a specific individual (i.e. conditional on the random intercept and slope) what is the effect of a variable on an outcome. However, this isn’t very useful if you are concerned with the marginal effect, i.e. what is the effect of a variable on an outcome on average in the population.

那么,您如何处理呢? 一种选择是拟合广义线性混合模型,其中每个人都有随机的截距和斜率项。 这将告诉您特定的个体(即以随机截距和斜率为条件),变量对结果的影响是什么。 但是,如果您关注边际效应,即变量对总体平均结果的影响是什么,这不是很有用。

If you want to answer these population questions you need to fit a generalized linear model using generalized estimating equations (GEE). This is an approach that obtains the population average effect accounting for the fact that observations within individuals are likely to be more similar than those between individuals.

如果要回答这些总体问题,则需要使用广义估计方程 (GEE)拟合广义线性模型。 这是一种获得人口平均效应的方法,说明了一个事实,即个人内部的观察可能比个人之间的观察更相似。

一个例子 (An example)

Suppose we have our outcome — all-cause mortality. Now suppose we record this every month for 10 months for every person. Now suppose our exposure, which is just time. We can now define a logistic regression model, with the sole independent variable being time (in months) and the dependent variable being death at that time. “Okay, great” I hear you say “but these observations are obviously not independent!”. Spot on, but we’ll come to that.

假设我们有结果 -全因死亡率。 现在假设我们每个人每个月记录一次,持续10个月。 现在假设我们的曝光,这只是时间。 现在,我们可以定义一个逻辑回归模型,唯一的自变量为时间(以月为单位),因变量为当时的死亡。 “好极了,”我听到你说“但是这些观察结果显然不是独立的!”。 当场,但是我们来谈谈。

工作相关结构 (Working correlation structures)

To use GEE we must first define how time points are related. However, by using Huber-White standard errors our results will be consistent even if we misspecify this relationship! So we have some choices.

要使用GEE,我们必须首先定义时间点之间的关系。 但是,通过使用Huber-White标准误差,即使我们未正确指定此关系,我们的结果也将保持一致 ! 所以我们有一些选择。

独立 (Independent)

This working correlation structure assumes that time points are independent of each other. This is probably an unreasonable assumption in practice.

该工作相关结构假定时间点彼此独立。 实际上这可能是一个不合理的假设。

可交换的 (Exchangeable)

This is where the correlation between observations at two time points is equal for any two time points. This is commonly used as it requires just one additional parameter α to be estimated.

这是任意两个时间点在两个时间点的观测值之间的相关性相等的地方。 这是常用的,因为它仅需要估计一个附加参数α。

自回归 (Autoregressive)

This is where the correlation between observations follows an autoregressive structure. Suppose we were using an AR-1 correlation matrix. This would mean that the correlation between month 1 and 2 of a person would be expected to be α and the correlation between month 1 and 3 of a person would be expected to be α², between month 1 and 4 would be α³ and so on.

这是观测值之间的相关遵循自回归结构的地方。 假设我们使用的是AR-1相关矩阵。 这意味着一个人的第1个月和第2个月之间的相关性预计为α,一个人的第1个月和第3个月之间的相关性预计为α2,第1个月和第4个月之间的相关性为α3,依此类推。

This is most appropriate when you think closer together time points are more similar than further apart time points.

当您认为更靠近的时间点比更远的时间点更相似时,这是最合适的。

非结构化 (Unstructured)

This is where we estimate a separate α for each possible combination of time points. This is the most general case. Though you need a lot of data to be able to estimate all of the α used.

在这里,我们为每个可能的时间点组合估计一个单独的α。 这是最一般的情况。 尽管您需要大量数据才能估计所有使用的α。

其他选择 (Other choices)

There do exist some other choices, but these aren’t widely used.

确实存在其他选择,但并未广泛使用。

如何选择使用哪一个? (How to choose what one to use?)

It’s simple. Either choose the most general one your data can support (depending on sample size) or you can choose one you think suits the data best. Either way, don’t sweat it! This approach is consistent even if you misspecify this.

这很简单。 选择您的数据可以支持的最通用的一种(取决于样本量),也可以选择您认为最适合该数据的一种。 无论哪种方式,都不要流汗! 即使您没有正确指定,这种方法也是一致的。

如何拟合模型 (How to fit the model)

Fitting the model is simple. We just fit a GLM using GEE with our specified working correlation matrix:

拟合模型很简单。 我们仅使用GEE和指定的工作相关矩阵来拟合GLM:

Image for post

Where:

哪里:

  • Yij is 1 if participant i died at time j

    如果参与者我在时间j死亡,则Yij为1
  • pij is the probability of death for participant i at time j

    pij是参与者i在时间j的死亡概率
  • β0 is the population average log odds of death at time 0. This can be exponentiated to obtain the odds of death at time 0.

    β0是时间0的总体平均对数死亡几率。可以对它进行幂运算以获得时间0的死亡几率。
  • β1 is the population average difference in log odds of death associated with a one-month increase in time. This can be exponentiated to obtain the odds ratio associated with a one-month increase in time.

    β1是与一个月的时间增加相关的人口平均对数死亡率的差异。 可以将其取幂以获得与一个月时间增加相关的优势比。
  • Tij is the time of the j’th measurement for participant i in months.

    Tij是参加者i的第j个测量时间,以月为单位。

That’s it. Our population average effect of a one-month increase on time increases the odds of death by an odds ratio of exp(β1).

而已。 我们的人口平均时间增加一个月会增加死亡几率,使之成为几率exp(β1)。

在R中拟合模型 (Fitting a model in R)

We can do this in R using geepack. Suppose our dataframe already existed with three columns death time and person.id all we have to do is:

我们可以使用geepack在R中做到这一点 。 假设我们的数据框已经存在,其中有三列death timeperson.id我们要做的就是:

library(geepack)
mod <- geeglm(death ~ time, id = person.id, waves = time, family=binomial, corstr="exchangeable")

You can then just call summary(mod) as normal and get your results!

然后,您可以像平常一样仅调用summary(mod)并获得结果!

And of course, you could add covariates to the model by just adding them to the formula. They can be time-varying or constant — either is fine!

当然,您可以通过将协变量添加到公式中来将协变量添加到模型中。 它们可以是时变的,也可以是恒定的-都可以!

结论 (Conclusion)

Hopefully you’ve come away from reading this with a basic idea of GEE. They should be a tool in the toolbox of any data scientist working with longitudinal data.

希望您已经阅读了有关GEE的基本概念后再阅读。 它们应该是使用纵向数据的任何数据科学家工具箱中的工具。

翻译自: https://towardsdatascience.com/an-introduction-to-generalized-estimating-equations-bc7dee570478

广义估计方程估计方法

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388198.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Unity3d鼠标点击屏幕来控制人物的走动

今天呢&#xff0c;我们来一起实现一个在RPG中游戏中十分常见的功能&#xff0c;通过鼠标点击屏幕来控制人物的走动。首先来说一下原理&#xff0c;当我们点击屏幕时&#xff0c;我们按照一定的方法&#xff0c;将屏幕上的二维坐标转化为三维坐标&#xff0c;然后我们从摄像机位…

Java中的ReentrantLock和synchronized两种锁定机制的对比

2019独角兽企业重金招聘Python工程师标准>>> 多线程和并发性并不是什么新内容&#xff0c;但是 Java 语言设计中的创新之一就是&#xff0c;它是第一个直接把跨平台线程模型和正规的内存模型集成到语言中的主流语言。核心类库包含一个 Thread 类&#xff0c;可以用它…

大数定理 中心极限定理_中心极限定理:直观的遍历

大数定理 中心极限定理One of the most beautiful concepts in statistics and probability is Central Limit Theorem,people often face difficulties in getting a clear understanding of this and the related concepts, I myself struggled understanding this during my…

探索性数据分析(EDA)-不要问如何,不要问什么

数据科学 &#xff0c; 机器学习 (Data Science, Machine Learning) This is part 1 in a series of articles guiding the reader through an entire data science project.这是一系列文章的第1部分 &#xff0c;指导读者完成整个数据科学项目。 I am a new writer on Medium…

IDEA 插件开发入门教程

2019独角兽企业重金招聘Python工程师标准>>> IntelliJ IDEA 是目前最好用的 JAVA 开发 IDE&#xff0c;它本身的功能已经非常强大了&#xff0c;但是每个人的需求不一样&#xff0c;有些需求 IDEA 本身无法满足&#xff0c;于是我们就需要自己开发插件来解决。工欲善…

安卓代码还是xml绘制页面_我们应该绘制实际还是预测,预测还是实际还是无关紧要?

安卓代码还是xml绘制页面Plotting the actual and predicted data is frequently used for visualizing and analyzing how the actual data correlate with those predicted by the model. Ideally, this should correspond to a slope of 1 and an intercept of 0. However, …

Mecanim动画系统

本期教程和大家分享Mecanim动画系统的重定向特性&#xff0c;Mecanim动画系统是Unity3D推出的全新的动画系统&#xff0c;具有重定向、可融合等诸多新特性&#xff0c;通过和美工人员的紧密合作&#xff0c;可以帮助程序设计人员快速地设计出角色动画。一起跟着人气博主秦元培学…

【嵌入式硬件Esp32】Ubuntu 1804下ESP32交叉编译环境搭建

一、ESP32概述EPS32是乐鑫最新推出的集成2.4GWi-Fi和蓝牙双模的单芯片方案&#xff0c;采用台积电(TSMC)超低功耗的40nm工艺&#xff0c;拥有最佳的功耗性能、射频性能、稳定性、通用性和可靠性&#xff0c;适用于多种应用和不同的功耗要求。 ESP32搭载低功耗的Xtensa LX6 32bi…

你认为已经过时的C语言,是如何影响500万程序员的?...

看招聘职位要c语言的占比真不多了&#xff0c;是否c语言真得落伍了&#xff1f; 看一下许多招聘平台有关于找纯粹的c语言开发的占比确实没有很多&#xff0c;都被Java&#xff0c;php&#xff0c;python等等語言刷屏。这对于入门正在学习c语言的小白真他妈就是惊天霹雳&#xf…

云尚制片管理系统_电影制片厂的未来

云尚制片管理系统Data visualization is a key step of any data science project. During the process of exploratory data analysis, visualizing data allows us to locate outliers and identify distribution, helping us to control for possible biases in our data ea…

JAVA单向链表实现

JAVA单向链表实现 单向链表 链表和数组一样是一种最常用的线性数据结构&#xff0c;两者各有优缺点。数组我们知道是在内存上的一块连续的空间构成&#xff0c;所以其元素访问可以通过下标进行&#xff0c;随机访问速度很快&#xff0c;但数组也有其缺点&#xff0c;由于数组的…

201771010102 常惠琢《面向对象程序设计(java)》第八周学习总结

1、实验目的与要求 (1) 掌握接口定义方法&#xff1b; (2) 掌握实现接口类的定义要求&#xff1b; (3) 掌握实现了接口类的使用要求&#xff1b; (4) 掌握程序回调设计模式&#xff1b; (5) 掌握Comparator接口用法&#xff1b; (6) 掌握对象浅层拷贝与深层拷贝方法&#xff1b…

新版 Android 已支持 FIDO2 标准,免密登录应用或网站

谷歌刚刚宣布了与 FIDO 联盟达成的最新合作&#xff0c;为 Android 用户带来了无需密码、即可登录网站或应用的便捷选项。 这项服务基于 FIDO2 标准实现&#xff0c;任何运行 Android 7.0 及后续版本的设备&#xff0c;都可以在升级最新版 Google Play 服务后&#xff0c;通过指…

t-sne原理解释_T-SNE解释-数学与直觉

t-sne原理解释The method of t-distributed Stochastic Neighbor Embedding (t-SNE) is a method for dimensionality reduction, used mainly for visualization of data in 2D and 3D maps. This method can find non-linear connections in the data and therefore it is hi…

Android Studio如何减小APK体积

最近在用AndroidStudio开发一个小计算器&#xff0c;代码加起来还不到200行。但是遇到一个问题&#xff0c;导出的APK文件大小竟然达到了1034K。这不科学&#xff0c;于是就自己动手精简APK。下面我们大家一起学习怎么缩小一个APK的大小&#xff0c;以hello world为例。 新建工…

js合并同类数组里面的对象_通过同类群组保留估算客户生命周期价值

js合并同类数组里面的对象This is Part I of the two-part series dedicated to estimating customer lifetime value. In this post, I will describe how to estimate LTV, on a conceptual level, in order to explain what we’re going to be doing in Part II with the P…

C#解析HTML

第一种方法&#xff1a;用正则表达式来分析 [csharp] view plaincopy 转自网上的一个实例&#xff1a;所有的href都抽取出来&#xff1a; using System; using System.Net; using System.Text; using System.Text.RegularExpressions; namespace HttpGet { c…

com编程创建快捷方式中文_如何以编程方式为博客创建wordcloud?

com编程创建快捷方式中文Recently, I was in need of an image for our blog and wanted it to have some wow effect or at least a better fit than anything typical we’ve been using. Pondering over ideas for a while, word cloud flashed in my mind. &#x1f4a1;Us…

ETL技术入门之ETL初认识

ETL技术入门之ETL初认识 分类&#xff1a; etl2014-07-10 23:11 3021人阅读 评论(2) 收藏 举报数据仓库商业价值etlbi目录(?)[-] ETL是什么先说下背景知识下面给下ETL的详细解释定义现在来看下kettle的transformation文件一个最简单的E过程例子windows环境 上图左边的是打开表…

ActiveSupport::Concern 和 gem 'name_of_person'(300✨) 的内部运行机制分析

理解ActiveRecord::Concern&#xff1a; 参考:include和extend的区别&#xff1a; https://www.cnblogs.com/chentianwei/p/9408963.html 传统的模块看起来像&#xff1a; module Mdef self.included(base)# base(一个类)扩展了一个模块"ClassMethods"&#xff0c; b…