I am feeling sick. Fever. Cough. Stuffy nose. And it’s wintertime. Do I have the flu? Likely. Plus I have muscle pain. More likely.
我感到恶心。 发热。 咳嗽。 鼻塞。 现在是冬天。 我有流感吗? 可能吧 另外我有肌肉疼痛。 更倾向于。
Bayesian networks are great for these types of inferences. We have variables, some whose values have been fixed. We are interested in the probabilities of some free variables given these fixed values.
贝叶斯网络非常适合这些类型的推断。 我们有变量,有些变量的值是固定的。 给定这些固定值,我们对一些自由变量的概率感兴趣。
In our example, we want the probability that we have the flu, given some symptoms we have observed, and the season we are in.
在我们的示例中,鉴于我们观察到的某些症状以及我们所处的季节,我们希望获得流感的可能性。
So far it looks like reasoning with conditional probabilities. Is there more to it? Yes. A lot more. Let’s scale up this example and it will come out.
到目前为止,它看起来像是带有条件概率的推理。 还有更多吗? 是。 多很多。 让我们扩大这个例子,它就会出来。
Towards A Large-scale Bayes Network
迈向大规模贝叶斯网络
Imagine that our network models every possible symptom, every possible disease, outcomes of every possible medical test, and every possible external factor that might potentially affect the probability of some disease. External factors break down into behavioral ones (smoking, being a couch potato, eating too much), physiological ones ( weight, gender, age), and others. For good measure, let’s also throw in treatments. And side-effects.
想象一下,我们的网络对每种可能的症状,每种可能的疾病,每种可能的医学检查的结果以及每种可能影响某种疾病发生概率的外部因素进行建模。 外部因素可分为行为因素(吸烟,吃土豆,进食过多),生理因素(体重,性别,年龄)等。 好的,让我们也进行一些治疗。 和副作用。
By now there is enough and useful medical knowledge to capture tens of thousands of variables (at the very least) and their interactions. For any set of symptoms, together with the values of some of the behavioral, physiological, and other external factors, we could estimate the probabilities of various diseases. And more. For a given disease, we could ask it to give us the most likely symptoms. And way more. Such as I have a cough and high fever but the flu has been diagnosed out, what other diseases are likely? For a given diagnosis, and our particular symptoms, and possibly additional factors such as our gender and age, we could ask it to recommend treatments.
到目前为止,已有足够且有用的医学知识可以捕获成千上万的变量(至少)和它们之间的相互作用。 对于任何一组症状,以及某些行为,生理和其他外部因素的价值,我们可以估计各种疾病的可能性。 和更多。 对于给定的疾病,我们可以要求它给我们最可能的症状。 还有更多。 例如我咳嗽和高烧,但已经诊断出流感,还有什么其他疾病可能 ? 对于给定的诊断,我们的特殊症状以及可能的其他因素,例如我们的性别和年龄,我们可以要求其推荐治疗方法。
Now we are getting somewhere. How does all this magic work? This is what we will explore here.
现在我们到了某个地方。 所有这些魔术如何起作用? 这就是我们将在这里探讨的内容。
Connectivity
连接性
First question, where does the network come in? In modeling the interactions among the tens of thousands of variables.
第一个问题, 网络从哪里来? 在建模中数以万计的变量之间的相互作用。
Modeling all possible interactions among that-many variables is nearly impossible. It is the network that gives us a mechanism to cut through this complexity. By letting us specify which interactions to model. The aim is to seek a model that is rich enough. But not overly complex.
对这多个变量之间所有可能的相互作用进行建模几乎是不可能的。 正是网络为我们提供了一种消除这种复杂性的机制。 通过让我们指定要建模的交互。 目的是寻求足够丰富的模型。 但不要过于复杂。
Speaking of interactions, how do we decide which ones to model? Typically via domain knowledge. In our case, leveraging the collective knowledge of the medical field acquired over millennia of clinical practice and research.
说到交互,我们如何确定要建模的模型? 通常通过领域知识。 在我们的案例中,利用了几千年来临床实践和研究获得的医学领域的集体知识。
What would our Bayes net look like? Structurally, a giant directed graph with nodes for the various symptoms, diseases, medical tests, behavioral factors, physiological factors, and treatment options. With suitably chosen (or inferred) arcs to model significant interactions among them. Such as among specific symptoms and specific diseases.
我们的贝叶斯网会是什么样? 在结构上,一个巨型有向图,其节点包含各种症状,疾病,医学检查,行为因素,生理因素和治疗选择。 使用适当选择(或推断)的弧来模拟它们之间的重要交互。 例如特定的症状和特定的疾病。
Connectivity Refined
完善的连通性
A Bayes network is structurally a directed graph, an acyclic one at that. Directed means that edges have a direction to them, which is why they are called arcs. Acyclic means there are no directed cycles. Here is an example of a directed cycle: A → B → C → A.
贝叶斯网络在结构上是有向图,此时是无环图。 导演意味着边缘有一个方向给他们,这就是为什么他们被称为弧 。 非循环意味着没有定向循环。 这是一个有向循环的示例: A → B → C → A 。
Apart from the acyclicity constraint, the modeler has full control over what nodes to connect with arcs and how to orient them. That said, in complex real-world use cases such as the one we are discussing here (medical diagnosis) there is an appealing guiding principle.
除了非循环性约束之外,建模者还可以完全控制要与弧连接的节点以及如何定向弧。 就是说,在复杂的实际用例(例如我们在这里讨论的用例)(医学诊断)中,有一个吸引人的指导原则。
Choose arcs to model direct causes. Orient them in the direction of causality.
选择弧以模拟直接原因。 使他们朝向因果关系的方向 。
So if A is a direct cause of B, we would add the arc A → B. Such a network is called a causal Bayes network.
因此,如果A是B的直接原因,我们将添加弧A → B 。 这样的网络称为因果贝叶斯网络。
A causal network’s structure is only as accurate as its variables and the fidelity of the causal relationships. For instance, the truth might be that A causes B and B causes C. But we might not even know of B’s existence. So the best we would be able to do is to model this via the arc A → C.
因果网络的结构仅取决于其变量和因果关系的保真度。 例如,事实可能是A导致B且B导致C。 但是我们甚至可能不知道B的存在。 因此,我们最好的办法是通过弧A → C对此进行建模。
Causal Modeling
因果模型
Okay, so let’s think causally in the medical setting. This is what we come up with.
好吧,让我们在医疗环境中考虑一下。 这就是我们想出的。
Variable Type A causes Variable Type B Exampledisease causes symptom flu causes you to coughbehavior causes disease smoking causes lung cancerphysiological causes disease aging “causes” various
factor diseasestreatment "causes" disease chemotherapy reduces
cancertreatment causes side-effect chemotherapy causes
hair-loss
Before closing this section, let’s note that we shouldn’t worry too much about getting a few causal arcs wrong. (Of course, we prefer not to.) The consequences are not severe. In fact, we’ll likely have quite a new non-causal arcs in the network anyhow. To model correlations whose links to causation are unclear or non-existent. In fact, the network can’t even distinguish between casual and non-casual arcs. Not in our use case.
在关闭本节之前,让我们注意,我们不要太担心弄错一些因果关系。 (当然,我们不愿意这样做。)后果并不严重。 实际上,无论如何,我们很可能会在网络中出现一个新的非因果弧。 建模与因果关系不清楚或不存在的关联。 实际上,网络甚至无法区分临时弧和非临时弧。 不在我们的用例中。
Take this example. Say A and B are strongly correlated. Say you thought A causes B, so modeled this with the arc A → B. But you were wrong. Adding this arc is still a good thing, as it models the correlation. The next section discusses non-causal arcs in more detail.
举这个例子。 说A和B是高度相关的。 假设您认为A导致B ,所以用弧A → B对此建模。 但是你错了。 添加弧线仍然是一件好事,因为它可以对相关性进行建模。 下一节将更详细地讨论非因果弧。
Non-causal Arcs
非因果弧
Causality is a compelling guiding principle in the network’s design. However, it is not sufficient. That is, adding non-causal arcs can improve the model further.
因果关系是网络设计中令人信服的指导原则。 但是,这还不够。 也就是说,添加非因果弧可以进一步改善模型。
Consider correlations among variables. Such as among a set of symptoms or a set of diseases. Causal relationships within the set may not be known or even exist. We do want to model the correlations though. So we should add suitable “non-causal” arcs.
考虑变量之间的相关性。 如一组症状或一组疾病。 集合内的因果关系可能未知,甚至不存在。 我们确实想对相关性进行建模。 因此,我们应该添加合适的“非因果”弧。
Here is a simple example. Say there is strong belief or evidence that dry cough and irritated throat are correlated. Say these are the only two variables in the network. Connecting them with an arc in either direction will capture this correlation. Leaving the arc out will treat them as independent. We don’t want that.
这是一个简单的例子。 说有强烈的信念或证据表明干咳和喉咙发炎是相关的。 假设这些是网络中仅有的两个变量。 将它们与任一方向的弧形连接将捕获此相关性。 放任不管,将它们视为独立的。 我们不想要那个。
The Network’s Master Equation
网络的主要方程式
At some juncture, just like a picture can reveal a vista, so can math. We are at that point. So here goes.
在某个关头,就像图片可以展现远景一样,数学也可以展现远景。 我们到了这一点。 所以去。
Formally, a Bayes Network is a directed acyclic graph on n nodes. The nodes, call them X1, X2, …, Xn, model random variables. The arcs model interactions among them.
形式上,贝叶斯网络是n个节点上的有向无环图。 节点称它们为X 1, X 2,…, X n,对随机变量进行建模。 弧模型模拟了它们之间的相互作用。
More precisely, the structure of the network factors the joint distribution over the n variables as
更准确地说,网络的结构将n个变量的联合分布作为
P(X1, X2, …, Xn) = product_i P(Xi|parents(Xi))
P (X1, X 2,…,Xn)= product_ i P ( X i | 父母 (Xi))
There is a lot to unpack here. Let’s start with: parents(Xi) is the set of nodes with arcs coming into Xi. Huh?
这里有很多要解压的东西。 让我们开始:parents( X i)是进入X i的弧的节点集。 ??
Let’s ease into it with simple examples. All have the same 5 nodes A, B, C, D, E.
让我们通过简单的示例来简化它。 全部具有相同的5个节点A,B,C,D,E。
Our first network will have no arcs. So none of the nodes will have any parents either. So
我们的第一个网络将没有弧线。 因此,任何节点都不会有任何父节点。 所以
P(A,B,C,D,E) = P(A)P(B)P(C)P(D)P(E)
P(A,B,C,D,E)= P(A)P(B)P(C)P(D)P(E)
Our second network will be a Markov chain. Structurally, the graph is a single path A → B → C → D → E. Node A does not have any parents. Node B’s parent is A. Node C’s parent is B. Etc. So
我们的第二个网络将是马尔可夫链。 从结构上讲,该图是单个路径A→B→C→D→E。节点A没有任何父代。 节点B的父节点是A。节点C的父节点是B。
P(A,B,C,D,E) = P(A)P(B|A)P(C|B)P(D|C)P(E|D)
P(A,B,C,D,E)= P(A)P(B | A)P(C | B)P(D | C)P(E | D)
Our third network is the Naive Bayes classifier in which E serves as the class variable and A, B, C, and D as the predictor variables. It’s graphical structure is
我们的第三个网络是朴素贝叶斯分类器,其中E充当类变量,而A,B,C和D充当预测变量。 它的图形结构是
E → A, E → B, E → C, E → D
E→A,E→B,E→C,E→D
E has no parents. Each of A, B, C, and D has one parent: E. Accordingly
E没有父母。 A,B,C和D中的每个都有一个父对象:E。
P(A,B,C,D,E) = P(A|E)P(B|E)P(C|E)P(D|E)P(E)
P(A,B,C,D,E)= P(A | E)P(B | E)P(C | E)P(D | E)P(E)
Readers familiar with naive Bayes classifiers will recognize the form on the right-hand side of this equation. Think of A, B, C, D as the predictors, E as the class variable.
熟悉朴素贝叶斯分类器的读者会认识到该方程式右侧的形式。 将A,B,C,D视为预测变量,将E视为类变量。
Now we are ready for a clinical example.
现在我们准备好一个临床例子。
Clinical Network Example: Flu and its Symptoms
临床网络示例:流感及其症状
Consider the network whose variables are flu, fever, cough, stuffy nose, and season. For simplicity suppose the first four are boolean (yes/no) and the third categorical (spring, summer, fall, winter).
考虑一下网络,其变量包括流感 , 发烧 , 咳嗽 , 鼻塞和季节 。 为简单起见,假设前四个是布尔值(是/否),第三个是布尔值(Spring,夏季,秋季,冬季)。
Causal modeling would yield the following arcs:
因果建模将产生以下弧:
flu → fever, flu → cough, flu → stuffy nose
To these let’s add the arc flu
→ season
. This is not a causal arc, i.e., we could have flipped its direction. But we won’t. So that its direction is aligned with the direction of the causal arcs emanating from flu. This will be convenient for the diagnosis covered in the next section.
除了这些,我们还可以添加arc flu
→ season
。 这不是因果关系,也就是说,我们可以改变其方向。 但是我们不会。 使其方向与由流感引起的因果弧的方向一致。 这将为下一节中介绍的诊断提供方便。
Interestingly, it’s not a coincidence that this network’s structure is that of the naive Bayes classifier.
有趣的是,该网络的结构不是朴素的贝叶斯分类器的结构并非巧合。
Diagnosis: From Symptoms To Flu
诊断:从症状到流感
We want the probability that we have the flu, given that we have a fever, cough, stuffy nose, and wintertime. Let’s formally express this as
考虑到我们发烧 , 咳嗽 , 鼻塞和冬天 ,我们希望有感冒的可能性。 让我们正式表达为
P(flu = yes | fever = yes, cough = yes, stuffy nose = yes, season = winter)
or more concisely (and a bit more generally) as
或更简洁(和更普遍一些)
P(flu|fever,cough,stuffy nose, season)
To infer this, we just apply the Bayes rule:
为了推断这一点,我们仅应用贝叶斯规则:
numerator(x) = P(fever|flu=x)*P(cough|flu=x)*P(stuffy nose|flu=x)*P(season | flu=x)*P(flu=x)P(flu=yes|fever, cough, stuffy nose, season) = numerator(yes)/(numerator(yes)+numerator(no))
This is why this network is called a Bayesian network. The inference from symptoms to a disease involves Bayesian reasoning.
这就是为什么将此网络称为贝叶斯网络的原因。 从症状推断出疾病涉及贝叶斯推理。
The “Beyond Flu” Network
“超越流感”网络
We already have a prescription, so let’s execute. First, start adding nodes for additional diseases and symptoms. Second, add nodes for behaviors, physiological factors, medical tests, etc. Third, start adding more causality arcs, following the guidance given earlier. Such as
我们已经有了处方,所以让我们执行吧。 首先,开始添加其他疾病和症状的节点。 第二,添加行为,生理因素,医学检查等方面的节点。第三,按照先前给出的指导,开始添加更多因果关系弧。 如
smoking → lung cancer, aging → disease-1, aging → disease-2, …, aging → disease-kchemotherapy → cancer, chemotherapy → hair-loss
Next, start adding suitable non-causal arcs. To capture correlations among symptoms, correlations among diseases, etc.
接下来,开始添加合适的非因果弧。 捕获症状之间的关联,疾病之间的关联等
The macrostructure of the “backbone” of such a network is below.
这种网络的“骨干”的宏观结构如下。
behaviors, physiological factors ⇒ diseases
treatments ⇒ diseases
diseases ⇒ symptoms
treatments ⇒ side-effects
tests?
测试?
The terms in plural denote sets of nodes of certain types. Such as diseases. X ⇒ Y denotes a set of arcs from X to Y. This level does not reveal the heads and tails of specific arcs.
复数形式的术语表示某些类型的节点集。 如疾病。 X⇒Y表示从X到Y的一组弧。此级别不显示特定弧的首尾。
We have already discussed why the arc sets are oriented the way they are. The reason we have chosen behaviors and physiological factors to jointly influence diseases is that these two types of factors interact. For instance, the adverse effect of certain bad behavior choices on certain diseases is often higher in older people than in younger people.
我们已经讨论了为什么弧集以这种方式定向。 我们选择行为和生理因素共同影响疾病的原因是这两种因素相互作用。 例如,某些不良行为选择对某些疾病的不利影响通常在老年人中比在年轻人中高。
The macro-parents of diseases could in fact be more elaborate. Such as
实际上,疾病的宏观父母可能更加复杂。 如
behaviors, physiological factors, treatments ⇒ diseases
This would model the joint interaction of all three types of factors, behaviors, physiological factors, and treatments on diseases. That said, such a macro-level interaction would in general produce quite a complex network. So to convey the essence of the backbone, we’ll stick to our earlier macro-structure. That said, exceptions, i.e. specific triplets of (behavior, physiological factor, treatment) that influence a particular disease can always be added in. The macro-structure is just a big picture view, not an enforceable schema. The schema is only at the fine-level, specified by the network’s arcs.
这将模拟所有三种类型的因素, 行为 , 生理因素和疾病治疗的联合相互作用。 就是说,这种宏观层面的互动通常会产生相当复杂的网络。 因此,为了传达骨干网的本质,我们将继续使用我们先前的宏观结构。 也就是说,总是可以添加例外,即影响特定疾病的特定三联体( 行为 , 生理因素 , 治疗 )。宏观结构只是一幅全景图,而不是可强制执行的方案。 该模式仅处于由网络弧线指定的精细级别。
Notice we have a set of nodes, tests, which is dangling. We’ll let you ponder how this set should be connected to the rest of the network. Should we have tests ⇒ diseases, or diseases ⇒ tests, or some other?
注意,我们有一组悬挂的节点test 。 我们将让您考虑如何将此设备连接到网络的其余部分。 我们应该进行检查 ⇒ 疾病 ,还是疾病 ⇒ 检查 ,或其他一些检查 ?
Training the “Beyond Flu” Network
培训“超越流感”网络
Training means estimating the various probability distributions P(Xi|parents(Xi)) of the model from data, belief, or a combination.
训练意味着根据数据,信念或组合来估计模型的各种概率分布P ( X i | 父母 (Xi))。
Training Symptom Distributions
训练症状分布
Let’s start with learning the probability distribution of any one symptom conditioned on its parents. Let’s make a simplifying assumption that a symptom’s parents can only be diseases. For instance, parents of the symptom cough would include flu and bronchitis.
让我们从学习以其父母为条件的任何一种症状的概率分布开始。 让我们做一个简化的假设,即症状的父母只能是疾病。 例如,症状咳嗽的父母包括流感和支气管炎 。
Given a symptom S and its parents pa(S), the conditional probability table to capture P(S|pa(S)) is exponential in the number of diseases in pa(S). This is because in principle any subset of the n diseases in pa(S) can occur. (By “occur” we mean diagnosed in a particular visit.) There are 2^n such subsets. This can be quite large when n is large.
给定症状S及其父项pa ( S ),捕获P ( S | pa ( S ))的条件概率表在pa ( S )中的疾病数上呈指数关系。 这是因为原则上可以发生pa ( S )中n种疾病的任何子集。 (“发生”是指在特定的访问中被诊断出。)有2 ^ n个这样的子集。 当n大时,这可能会很大。
Three factors will collectively mitigate this issue. One is that most symptoms will not have a huge number of parents, i.e. a huge number of diseases that can cause them.
三个因素将共同缓解这一问题。 一个是大多数症状不会有很多父母,也就是会导致这些症状的许多疾病。
The second is that in any one instance, the diagnosed diseases will be a sparse subset of the parents. A diagnosis instance corresponds to taking a snapshot of the state of the diseases of a particular person displaying the symptom. Of all the potential diseases the symptom can appear in, a single person will almost certainly be diagnosed with at most a few. If even more than one. This sparsity will greatly help the training. Simply put, sparsity implies “no significant higher-order interactions”. A numeric example below will illustrate this phenomenon.
其次,在任何情况下,被诊断出的疾病将是父母的稀疏子集。 诊断实例对应于拍摄显示症状的特定人的疾病状态的快照。 在症状可能出现的所有潜在疾病中,几乎可以肯定一个人被诊断出最多。 如果不止一个。 这种稀疏性将极大地帮助培训。 简而言之,稀疏性意味着“没有明显的高阶相互作用”。 下面的数字示例将说明此现象。
The third factor is that we have some control over what we deem to include in the set of parents pa(S) of a given symptom S. If a symptom’s parent set gets especially large, we can prune away diseases that are less correlated with the symptom.
第三个因素是,我们对我们认为要包含在给定症状S的父母pa ( S )中的内容具有一定的控制权。 如果症状的父集变得特别大,我们可以修剪掉与症状相关性较低的疾病。
Discovering A Symptom’s Parents From Data
从数据中发现症状的父母
Which diseases should we set as the parents of a given symptom S? Previously we suggested, as a general guideline, using domain knowledge for this. In our particular case, there is a better way. Patient records will reveal which symptoms correlate with which diseases. So this aspect of the structure can also be fruitfully learned from data. The patient records capture within them the collective wisdom of lots of experts making diagnoses in varying scenarios.
我们应将哪些疾病定为给定症状S的父母? 之前,我们建议将域知识用于一般指导原则。 在我们的特定情况下,有更好的方法。 患者记录将揭示哪些症状与哪些疾病相关。 因此,也可以从数据中学到结构的这一方面。 患者记录收集了许多专家在各种情况下进行诊断的集体智慧。
The benefit of learning a symptom’s parents from the data are huge. This avoids the network designer from having to acquire the domain knowledge to do this — whether it be via discussions with domain experts, extended readings, or some more elaborate mechanism. Even if this work were distributed over a large team of modelers and domain experts such manual design is laborious and error-prone. There are too many symptoms and too many diseases.
从数据中学习症状父母的好处是巨大的。 这样就避免了网络设计人员必须获取领域知识才能做到这一点-无论是通过与领域专家的讨论,扩展的阅读范围或更复杂的机制进行的。 即使这项工作分散在一大批建模者和领域专家的团队中,这种手动设计也很费力且容易出错。 症状太多,疾病太多。
That said, domain knowledge can still help fill in the gaps for situations that may not be covered by patient records, or to surface inconsistencies between belief and data. Simply put, domain-knowledge + data-driven learning is generally better than either alone.
也就是说,领域知识仍然可以帮助填补患者记录可能无法覆盖的情况的空白,或填补信念和数据之间的不一致之处。 简而言之,领域知识+数据驱动的学习通常比任何一个都要好。
We’ll discuss patient visit records in detail in the next section, as we will anyhow need them for learning the parameters of the network, such as the probabilities in P(S|pa(S)). Regardless of how we have arrived at the structure of pa(S).
在下一节中,我们将详细讨论患者就诊记录,因为无论如何我们都将需要它们来学习网络参数,例如P ( S | pa ( S ))的概率。 无论我们如何得出pa ( S )的结构。
Patient Visit Records
患者就诊记录
We’ll assume every interaction with a medical expert generates a new record, capturing the symptoms observed and the diseases diagnosed. If multiple diseases were diagnosed, which of the observed symptoms were implicated in which disease are also captured. As deemed by the medical expert. The diagnosis may be as certain or as speculative as the expert sees fit. All we care about is that it was done by a professional.
我们假设与医学专家的每次互动都会产生新的记录,记录观察到的症状和诊断出的疾病。 如果诊断出多种疾病,则涉及哪些观察到的症状,哪些疾病也被捕获。 由医学专家认为。 诊断可以按照专家认为适当的确定或推测。 我们只关心它是由专业人士完成的。
Let’s see an example patient visit record. Made up. Not medical advice!
让我们看一个示例患者访问记录。 捏造。 没有医疗建议!
(symptoms = high fever, cough, sore throat, lump in throat; disease = flu)
(symptoms = lump in throat, chest pain; disease = gerd)
During this visit, two diseases were diagnosed: flu and GERD. The health expert implicated lump in throat in both.
在这次访问中,诊断出两种疾病: 流感和GERD 。 这位健康专家暗示这两种情况都有喉咙肿块 。
From such a record we can derive symptom-centered representations, one for each observed symptom. Such a representation lists the diagnosed diseases implicated to that symptom during the visit. These diseases will also be referred to as the symptom’s parents in that visit record.
从这样的记录中,我们可以得出以症状为中心的表示形式,每种观察到的症状都有一个。 这样的表述列出了就诊期间与该症状有关的诊断疾病。 在该访问记录中,也将这些疾病称为症状的父母。
In our above example, lump in throat’s parents in the record are flu and GERD.
在我们上面的示例中,记录中的喉咙父母中有流感和GERD 。
Symptom-centered representations lend themselves to learning symptom distributions.
以症状为中心的表示形式有助于学习症状分布。
Discovering A Symptom’s Parents
发现症状的父母
From the collection of symptom-centered representations derived from all the patient visit records we have access to, we can easily determine the symptom’s parents. These are all the diseases implicated in this data. The parents of lump in throat would be flu and GERD if all we had is the single patient visit record to learn from.
从我们可以访问的所有患者就诊记录中得出的以症状为中心的表示形式中,我们可以轻松确定症状的父母。 这些都是与该数据有关的疾病。 如果我们仅有的单次患者就诊记录,那么父母的喉咙会是流感和GERD 。
A huge and diverse set of patient visit records may yield, for some symptoms, huge sets of parents. As mentioned earlier, we can prune such large sets by dropping parents that are less correlated with the symptom.
对于某些症状,大量多样的患者就诊记录可能会产生大量父母。 如前所述,我们可以通过删除与症状相关性较低的父母来删节这些大集合。
Training Symptom Distributions From Patient Visit Records
从患者就诊记录中训练症状分布
We want to learn, for each symptom, its distribution conditioned on its parents. We have a symptom-centered data set available for this learning. (This was derived from patient visit records as described earlier.)
对于每种症状,我们都希望了解其症状以其父母为条件。 我们有一个以症状为中心的数据集可用于此学习。 (这是根据先前所述的患者就诊记录得出的。)
Consider any one instance in this data set. It lists a symptom, together with the diseases implicated with it during a patient visit. What it does not list is the diseases among the symptom’s parents that were not implicated. As we will see below, we need this information as well. Fortunately, we can deduce these diseases by subtracting the implicated diseases from the symptom’s parents.
考虑此数据集中的任何一个实例。 它列出了症状以及患者就诊时涉及的疾病。 它没有列出的是症状父母之间没有牵连的疾病。 正如我们将在下面看到的,我们也需要此信息。 幸运的是,我们可以通过从症状的父母中减去所涉及的疾病来推断出这些疾病。
Let’s see an example. Say cough’s parents are flu, pneumonia, and asthma. (In a real network this list would include a lot more diseases.) Say cough’s parents in a particular patient record are flu. From this, we can deduce that in this instance cough is not caused by pneumonia or asthma. While this deduction is not correct with 100% certainty in this instance repeated occurrences of this same deduction do give a good estimate of the associated conditional probabilities.
让我们来看一个例子。 说咳嗽的父母 是流感 , 肺炎和哮喘 。 (在真实的网络中,此列表将包括更多的疾病。) 咳嗽的父母在特定患者记录中都是流感 。 据此,我们可以推断出在这种情况下咳嗽不是由肺炎或哮喘引起的。 尽管在这种情况下此推论不是100%肯定正确的,但重复出现相同的推论确实可以很好地估计相关的条件概率。
From these two pieces of information — which diseases among a symptom’s parents are implicated to and which not in a particular patient record — we will derive a training vector of the following form.
从这两条信息(症状的父母当中涉及哪些疾病,而在特定的患者记录中没有涉及),我们将得出以下形式的训练向量。
cough flu pneumonia asthma
1 1 0 0
This is easy to read. It says that, in this patient record, cough is present, and of cough’s parents, flu is diagnosed, pneumonia is not diagnosed, and asthma is not diagnosed.
这很容易阅读。 它说,在此患者记录中,存在咳嗽,并且在咳嗽的父母中,确诊为流感,未诊断为肺炎,也未诊断为哮喘。
Next, consider a patient record whose observed list of symptoms does not include cough. Next, derive values for cough’s parents in this record depending on whether a disease in this set of parents is diagnosed in that record or not.
接下来,考虑患者记录,其观察到的症状清单不包括咳嗽 。 接下来,根据该记录中是否诊断出该组父母中的疾病,导出该记录中咳嗽父母的值。
Here is an example. Say a patient record resulted in the diagnosis
这是一个例子。 说病历导致诊断
(symptoms = shortness of breath, chest pain, wheezing; diseases = asthma)
From this, we may derive the record
由此,我们可以得出记录
cough flu pneumonia asthma
0 0 0 1
Armed with a rich enough collection of such records, which of course will keep growing as people will keep getting sick in the foreseeable future, we can learn P(cough|parents(cough)). More broadly, the distribution for any symptom conditioned on its parents.
有了足够丰富的此类记录,随着人们在可预见的将来会不断生病,这些记录当然会继续增长,我们可以学习P ( 咳嗽 | 父母 ( 咳嗽 ))。 更广泛地说,任何症状的分布都取决于其父母。
Are such training instances, looked at individually, perfect? No. The absence of a disease in a diagnosis does not mean with certainty that it is not present, now or soon. The same applies to a symptom. That said, over a larger number of training instances in diverse-enough settings, such noise should get drowned out by the signal. For example, if only 30% of the records in which flu is diagnosed also reveal cough as an observed symptom, we can infer with high confidence that flu produces cough as an observed symptom no more than half the time.
这样的培训实例(单独查看)是否完美? 否。诊断中没有疾病并不意味着可以肯定地说现在或不久就不存在这种疾病。 症状也是如此。 也就是说,在足够多的不同环境下进行大量训练时,这种噪声应该被信号淹没。 例如,如果仅30%诊断为流感的记录也显示出咳嗽为观察到的症状,我们可以高度肯定地推断出流感产生的咳嗽为观察到的症状的时间不超过一半。
Training The Influence Of Behaviors And Physiological Factors On Diseases
训练行为和生理因素对疾病的影响
Here we refine the macro-structure
在这里,我们优化宏观结构
behaviors, physiological factors ⇒ diseases
We’ll assume the needed information may also be derived from patient records.
我们假设所需的信息也可能来自患者记录。
We seek to estimate, for every disease D, the parameters of D’s distribution conditioned on its parents. The parents of D are suitable subsets of the behaviors and physiological factors. Which behaviors and which physiological factors? These could be set via domain knowledge as a lot is known about which behaviors affect which diseases. (Adversely or beneficially.) Similarly for physiological factors. Alternatively or in addition, a disease’s parents could also be inferred from data.
我们力求针对每种疾病D估计D的分布参数,该参数取决于其父本。 D的父母是行为和生理因素的合适子集。 哪些行为和哪些生理因素? 这些可以通过领域知识来设置,因为人们知道哪些行为会影响哪些疾病。 (不利或有益。)对于生理因素也是如此。 替代地或附加地,还可以从数据推断出疾病的父母。
Let’s illustrate such training from data. Consider the following patient record
让我们从数据中说明这种训练。 考虑以下患者记录
smoker, 50 years old, male, diagnosed: lung cancer
First, from a collection of such records we can infer lung cancer’s parents, i.e. the behaviors and physiological factors that influence its diagnosis. As with symptom distributions, we need two more types of information to estimate the distribution of lung cancer given its parents.
首先,从此类记录的集合中,我们可以推断出肺癌的父母,即影响其诊断的行为和生理因素。 与症状分布一样,我们需要两种以上的信息来估计肺癌的父母分布情况。
In a particular diagnosis of lung cancer, which of the parents were missing?
在特定的肺癌诊断中 ,哪些父母失踪了?
- How to estimate the probability that one does not have lung cancer in the presence of some of its parents? 如何估算某些父母在场的情况下没有肺癌的可能性?
For 1, as in the symptoms case, the missing parents are the full set of parents minus those in this patient record. For 2, again as in the symptoms case, we derive these from patient records in which some of lung cancer’s parents occur whereas the patient is diagnosed as being free of lung cancer. An example is a smoker who does not have lung cancer. How do we decide whether a factor is “key” or not? Try domain knowledge.
对于1,如在症状案例中,缺少的父母是减去该患者记录中父母的全部父母。 对于2,同样如在症状案例中一样,我们从患者记录中得出这些数据,其中一些肺癌的父母会出现,而被诊断为没有肺癌。 一个例子是没有肺癌的吸烟者。 我们如何确定一个因素是否为“关键”? 尝试领域知识。
Training The Influence Of Treatments On Diseases
训练治疗方法对疾病的影响
We have a problem here. Our macro-structure schema had
我们这里有问题。 我们的宏观结构模式有
behaviors, physiological factors ⇒ diseases
treatments ⇒ diseases
That is, any single disease D would have two sets of parents, one involving certain combinations of behaviors and physiological factors, and the other involving treatments. We could, of course, combine these two sets of parents into one. Doing this widely has the issues discussed earlier. That said, specific triplets of behavior, physiological factor, and treatment in the context of specific diseases may be worth including. (As was discussed earlier.)
也就是说,任何一种疾病D都会有两组父母,一组涉及行为和生理因素的某些组合,另一组涉及治疗。 当然,我们可以将这两组父母合并为一个。 广泛进行此操作具有前面讨论的问题。 也就是说,在特定疾病的背景下,特定的三联症的行为,生理因素和治疗可能值得考虑。 (如前所述。)
To summarize we wouldn’t want to collapse
总而言之,我们不想崩溃
behaviors, physiological factors ⇒ diseases
treatments ⇒ diseases
into
进入
behaviors, physiological factors, treatments ⇒ diseases
as a general rule.
作为基本规则。
Keeping Two Sets Of Parents Separate
使两组父母分开
So how do we keep the two sets of parents separate for a given disease D? One way is to introduce an additional variable for D (we’ll call it DI) as below.
那么,如何针对给定的疾病D使两组父母分开? 一种方法是为D引入一个附加变量(我们将其称为DI ),如下所示。
behaviors, physiological factors ⇒ DItreatments, DI ⇒ D
We can think of DI as modeling disease onset and D as modeling the disease’s next state, following one or more treatments. That said, this scheme is incapable of modeling the dynamic evolution of a disease in response to treatments. This would require D to be a parent of DI, which would violate the acyclicity constraint on a Bayes network.
我们可以将DI看作是疾病发作的建模,而D则是将一种或多种治疗方法模拟为疾病的下一个状态。 也就是说,该方案无法对响应治疗的疾病动态演变建模。 这将要求D是DI的父代,这将违反Bayes网络上的非循环性约束。
Let’s see this in a specific example.
让我们在一个特定的示例中看到这一点。
diet, age, gender → heart disease-I
heart-disease-I, treatment → heart disease
Treatments And Side-Effects
治疗和副作用
Let’s start simple. We have a node for every side-effect. We have a node for every treatment. A side-effect’s parents are all treatments that have that side-effect.
让我们开始简单。 每个副作用都有一个节点。 我们为每个治疗提供一个节点。 副作用的父母都是具有该副作用的治疗方法。
Let’s see an example.
让我们来看一个例子。
chemotherapy, bone marrow transplantation, …, → fatigue
What is the value of including such arcs in our network? One is that it lets us seek treatments that are both effective for a particular disease and have relatively mild side-effects.
在我们的网络中包含此类弧的价值是什么? 其一是它使我们寻求既对特定疾病有效又具有相对温和副作用的治疗方法。
Inferences In This Scaled Network
此规模网络中的推论
Let’s start by repeating our network’s macro-structure here. This helps to see what types of inferences the network lends itself to.
让我们从这里重复网络的宏观结构开始。 这有助于了解网络适用于哪些类型的推理。
behaviors, physiological factors ⇒ diseases
treatments ⇒ diseases
diseases ⇒ symptoms
treatments ⇒ side-effects
tests ?
Now onto specific inferences. Each is followed by an explanation of how it can be made to work. In this explanation, we focus on whether and how the various probabilities involved can be computed from data or domain knowledge. The aim is to provide insights into how the structure of the network simplifies various calculations.
现在介绍具体的推论。 每一个后面都有一个解释,说明了如何使其工作。 在此说明中,我们重点关注是否可以从数据或领域知识中计算出涉及的各种概率,以及如何计算这些概率。 目的是提供有关网络结构如何简化各种计算的见解。
In practice, one may be using an inference algorithm as a black-box, which will do whatever it does behind the scenes.
在实践中,可能会将推理算法用作黑盒,这将在后台执行任何操作。
What is the likelihood of getting lung cancer if I smoke, am a female, and am 75 years old?
如果我吸烟,成年女性和75岁,罹患肺癌的可能性有多大?
We seek P(lung cancer | smokes, female, 75 years old).
我们寻求P ( 肺癌 | 吸烟 , 女性 , 现年75岁 )。
The good news is that all the observations this inference is conditioned on are lung cancer’s parents.
好消息是,此推断所依据的所有观察结果都是肺癌的父母。
The bad news is that lung cancer may have additional parents. These need to be marginalized out. Marginalization involves averaging over the various values these additional parents can take, weighted by their probabilities. As the number of such values is exponential in the number of additional parents, marginalization is a slow process. Sophisticated algorithms do exist to speed it up. Their discussion is beyond the scope of this post.
坏消息是肺癌可能会有更多的父母。 这些需要被边缘化。 边缘化涉及对这些额外的父母可以接受的各种值进行平均,并按其概率加权。 由于此类值的数量与其他父母的数量成指数关系,因此边缘化是一个缓慢的过程。 确实存在完善的算法可以加快速度。 他们的讨论超出了本文的范围。
Frequently used restrictions of node distributions can be cached at the node. Think of this as attaching, to a node S, not only P(S|parents(S)) but also P(S|subset(parents(S)) for suitable subsets of parents(S). Such cached distributions may then be used as appropriate, reducing the need for on-the-fly marginalization.
节点分布的常用限制可以缓存在该节点上。 可以认为这不仅是将P ( S | 父代 ( S ))附加到节点S上 ,而且是将P ( S | 子集 ( 父代 ( S ))附加到适当的父代 ( S )上。适当使用,以减少进行实时边缘化的需求。
I smoke, am a female, and am 75 years old. And I have a persistent cough. What is the likelihood I have lung cancer?
我吸烟,是位女性,现年75岁。 而且我持续咳嗽。 我患肺癌的可能性有多大?
We seek P(lung cancer | smokes, female, 75 years old, persistent cough). By Bayes rule,
我们寻求P ( 肺癌 | 吸烟 , 女性 , 75岁 , 持续咳嗽 )。 根据贝叶斯规则,
P(lung cancer | smokes, female, 75 years old, persistent cough) =
P(smokes, female, 75 years old, persistent cough | lung cancer)*P(lung cancer)/P(smokes, female, 75 years old, persistent cough)
(We’ll explain the bold-face font later.)
(稍后我们将解释黑体字体。)
Next, we leverage an important property.
接下来,我们利用重要属性。
A node is conditionally independent of its non-descendants given its parents.
一个节点有条件地独立于给定其父代的非后代 。
As this is the first time we are seeing this property in this post, let’s delve into it a bit. Consider the network A → B → C. (A Markov chain.) Applying the aforementioned conditional independence probability, we get that C is independent of A given B. That is, P(C|B, A) equals P(C|B). Or in other words, once we have observed B, the value of A provides no additional information towards predicting the value of C.
由于这是我们在本文中第一次看到此属性,因此让我们对其进行深入研究。 考虑网络A → B → C 。 (一条马尔可夫链。)应用上述条件独立概率,我们得出C独立于A给定B。 即, P ( C | B , A )等于P ( C | B )。 换句话说,一旦我们观察到B , A的值就没有提供任何有关预测C值的信息。
Applying this conditional independence property to our situation gives
将这种条件独立属性应用于我们的情况可以得出
P(smokes, female, 75 years old, persistent cough | lung cancer) =P(smokes,female,75 years old|lung cancer)*P(persistent cough|lung cancer)
Okay, let’s now collect together all the terms in bold. These are what remain to be estimated. We have copied them below.
好的,让我们现在将所有术语加粗在一起。 这些都是有待估计的。 我们已经在下面复制了它们。
P(lung cancer)
P(smokes, female, 75 years old, persistent cough)
P(smokes,female,75 years old|lung cancer)
P(persistent cough|lung cancer)
P(lung cancer) is easy to estimate from a sufficiently rich set of patient records. Some usable estimates may already exist in the public domain.
从一组足够丰富的患者记录中很容易估计出P ( 肺癌 )。 在公共领域中可能已经存在一些可用的估计。
P(persistent cough|lung cancer) can also be estimated from patient records as the fraction of records diagnosed with lung cancer that have persistent cough as an observed symptom.
P ( 持续性咳嗽 | 肺癌 )也可以从患者记录中评估为诊断为患有持续性咳嗽作为观察到症状的肺癌记录的一部分。
To estimate P(smokes, female, 75 years old, persistent cough), we’ll invoke the independence assumption. This leaves us with P(smokes), P(age), P(persistent cough), and P(female). The first three are easy to estimate from data combined with knowledge. The last one we can just set to 0.5.
为了估计P ( 吸烟 , 女性 , 75岁 , 持续咳嗽 ),我们将调用独立性假设。 这给我们留下了P ( 烟 ), P ( 年龄 ), P ( 持续性咳嗽 )和P ( 女性 )。 前三个很容易从结合知识的数据中估算出来。 我们可以将最后一个设置为0.5。
As a slight digression, strictly speaking, the variables mentioned in the previous paragraph are not all entirely independent. For instance, women live longer than men so age and gender are at least mildly dependent.
严格来讲,上段提到的变量并不是全部独立的。 例如,妇女的寿命比男子长,因此年龄和性别至少有一定程度的依赖性。
Finally, we are left with P(smokes, female, 75 years old|lung cancer). Conditioning (smokes, female, 75 years old) on lung cancer makes the former three conditionally dependent. So we should avoid invoking independence if we can. If we can’t, well it’s not the end of the world. The resulting inference is still meaningfully interpretable. Specifically, it operates as a Naive Bayes classifier which predicts lung cancer from smokes, female, age, and persistent cough treated as conditionally independent of the outcome.
最后,我们剩下P ( 抽烟 , 女性 , 75岁 | 肺癌 )。 对肺癌进行调理( 吸烟 , 女性 , 75岁 )使前三个有条件依赖 。 因此,如果可以的话,我们应该避免调用独立性。 如果我们做不到,那不是世界末日。 由此产生的推断仍然可以有意义地解释。 具体来说,它可以作为朴素贝叶斯分类器,可根据烟 , 女性 , 年龄和持续咳嗽 (有条件地独立于结果)预测肺癌 。
Macro Lesson
宏观课
The macro lesson from the above example is that when seeking to diagnose a disease from some observed physiological factors and some observed symptoms, the physiological factors can be reasonably assumed to be independent of the symptoms given the disease. Sure older people may be more likely to exhibit certain symptoms than younger ones. However, when we additionally condition on a disease that could explain the symptom, the added influence of being old is small in comparison.
上面示例的宏观教训是,当试图从某些观察到的生理因素和某些观察到的症状来诊断疾病时,可以合理地认为生理因素与给定疾病的症状无关。 当然,老年人比年轻人可能更容易表现出某些症状。 但是,当我们另外考虑一种可以解释症状的疾病时,相比之下,变老的额外影响很小。
What cancer treatments have minimal side-effects?
哪些癌症疗法副作用最小?
Let’s express this in terms of a hybrid of logic and probabilities. We seek treatments T such that P(cancer|T) is high and for every side-effect SE, P(SE|T) is low. The key observation here is that in both probabilities, the variable being conditioned on is among the parents of the variable whose probability distribution we seek to compute. (In the previous sentence, if the word “variable” is causing confusion, replace it by “event”.) Thus we can leverage the network’s structure to compute what we want efficiently.
让我们用逻辑和概率的混合来表达这一点。 我们寻求使T ( P ( 癌症 | T )高,而对SE的每个副作用都低P ( SE | T )的治疗方法T。 此处的主要观察结果是,在这两种概率中,以其为条件的变量位于我们要计算其概率分布的变量的父级中。 (在前一句话中,如果“变量”一词引起混乱,请用“事件”代替。)因此,我们可以利用网络的结构来有效地计算所需的内容。
Further Reading
进一步阅读
https://www.sciencedirect.com/science/article/pii/S1532046418302041
https://www.sciencedirect.com/science/article/pii/S1532046418302041
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5519723/ In this article, disease, and symptom mentions are also extracted from unstructured text such as Nurse notes. Named entity recognition (NER) techniques are useful for this purpose. (In this case, the named entities are diseases and symptoms.) Check out https://towardsdatascience.com/named-entity-recognition-in-nlp-be09139fa7b8 for more on NER.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5519723/在本文中,还从非结构化文本(如护士笔记)中提取了疾病和症状。 命名实体识别(NER)技术可用于此目的。 (在这种情况下,命名的实体是疾病和症状。)请访问https://towardsdatascience.com/named-entity-recognition-in-nlp-be09139fa7b8了解有关NER的更多信息。
http://www.cs.cmu.edu/~guestrin/Class/10701-S05/slides/bns-inference.pdf Insightful example here
http://www.cs.cmu.edu/~guestrin/Class/10701-S05/slides/bns-inference.pdf此处很有见地的示例
flu, allergy → sinus, sinus → headache, sinus → nose
Read this as “flu or allergy cause sinus, sinus causes a headache, and sinus can hamper the proper functioning of your nose”.
将此读为“流感或过敏引起鼻窦,鼻窦引起头痛,鼻窦会妨碍鼻子正常工作”。
翻译自: https://towardsdatascience.com/modeling-with-bayesian-networks-c7ebf28a8b6b
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/387876.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!