火影忍者做网站的超帅图片/域名权重查询工具

火影忍者做网站的超帅图片,域名权重查询工具,高质量的装修设计公司,株洲建设企业网站主成分分析具体解释The caption in the online magazine “WIRED” caught my eye one night a few months ago. When I focused my eyes on it, it read: “Can everything be explained to everyone in terms they can understand? In 5 Levels, an expert scientist explai…

主成分分析具体解释

The caption in the online magazine “WIRED” caught my eye one night a few months ago. When I focused my eyes on it, it read: “Can everything be explained to everyone in terms they can understand? In 5 Levels, an expert scientist explains a high-level subject in five different layers of complexity - first to a child, then a teenager, then an undergrad majoring in the same subject, a grad student, and finally, a colleague”.

几个月前一个晚上,在线杂志“ WIRED”的标题引起了我的注意。 当我将目光聚焦在它上面时,它写着:“ 可以用每个人都能理解的术语向所有人解释一切吗? 在5个级别中,一位专家科学家通过五个不同层次的复杂性解释了一个高级主题 - 首先是给孩子,然后是少年,然后是主修同一主题的本科生,然后是研究生,最后是同事 ”。

Curious, I clicked on the link and started learning about exciting new concepts I could finally grasp with my own level of understanding. Music, Biology, Physics, Medicine - all seemed clear that night. Needless to say, I couldn’t stop watching the series and went to bed very very late.

很好奇,我单击了链接,开始学习一些令人兴奋的新概念,这些新概念最终将以我自己的理解水平掌握。 音乐,生物学,物理学,医学-那天晚上似乎都很清楚。 不用说,我无法停止观看该系列节目,而且非常晚才上床睡觉。

Image for post
Screenshot of WIRED website, showing the collection of “5-levels” concepts (image: screenshot)
WIRED网站的屏幕截图,显示了“ 5级”概念的集合(图片:屏幕截图)

I actually started writing this article while working on a more technical piece. From a few paragraphs in the text, it grew in size until I felt my original article could no longer hold its weight. Could I explain the key concepts to peers and to co-workers, as well as to children and people without mathematical orientation? How far along will readers follow the explanations?

我实际上是在撰写更多技术文章时开始写这篇文章的。 从文本中的几段开始,它的大小不断增加,直到我觉得我的原始文章不再能承受它的重量。 我能否向同龄人和同事以及没有数学知识的孩子和人们解释关键概念? 读者将遵循这些解释多远?

Let’s find out :)

让我们找出来:)

1) Child

1)儿童

Sometimes, when we learn new things, we are told lots of facts and might be shown a drawing or a table with numbers. Seeing a lot of numbers and tables can be confusing, so it would be really helpful if we could reach the same conclusions, just using less of these facts, tables, and numbers.

有时,当我们学习新事物时,会被告知很多事实,并可能会看到带有数字的图形或表格。 看到大量的数字和表格可能会造成混淆,因此,如果我们能够使用较少的事实,表格和数字得出相同的结论,那将非常有帮助。

Principal Component Analysis (or PCA for short) is what we call an algorithm: a set of instructions to follow. If we represent all our facts and tables using numbers, following these instructions will allow us to represent them using fewer numbers.

主成分分析(或简称PCA)是我们所谓的算法:一组遵循的指令。 如果我们使用数字表示所有事实和表格,则按照以下说明进行操作可以使我们使用较少的数字表示它们。

If we convert these numbers back into facts, we can still draw the same conclusions. But because we drew them using fewer facts than before, performing PCA just helped us be less confused.

如果将这些数字转换为事实,我们仍然可以得出相同的结论。 但是,因为我们使用的事实比以前更少了,所以执行PCA可以减少我们的困惑。

2) High-school aged teenager

2)高中生

Our math studies focus on a few key areas that give us a basis for mathematical understanding and orientation. High school math often means learning Calculus, which deals with subjects like function analysis and analytic geometry. Let’s use them to better explain Principal Component Analysis.

我们的数学研究集中在几个关键领域,这些领域为我们提供了数学理解和定向的基础。 高中数学通常意味着学习微积分,它涉及功能分析和解析几何等主题。 让我们用它们更好地解释主成分分析。

Polynomial function (doodle)
You probably encountered functions like this one during one time on another in your high school (image by author)
您可能在高中的另一时间遇到了类似的功能(作者提供)

Functions are special objects in mathematics that give us an output value when we feed it an input value. This makes them very useful in learning certain relationships in data. A key technique taught in high school is how to draw the graph of a function by performing a short analysis of its properties. We use something from Calculus called the Derivative: A derivative is just an approximation of the slope of a function at a given point. We choose a few key points inside the area of our graph to find the values our function will take at these points and calculate the derivative of the function at these points to get hints about the slope of the function. Then we use the points and the slopes we just got to draw an approximation of the function’s shape.

函数是数学中的特殊对象,当我们将输入值提供给输入值时,函数会为我们提供输出值。 这使得它们在学习数据中的某些关系时非常有用。 高中教授的一项关键技术是如何通过对功能的简短分析来绘制功能图。 我们使用微积分中的一种叫做导数的东西:导数只是一个函数在给定点的斜率的近似值。 我们在图形区域内选择一些关键点,以查找函数在这些点处采用的值,并在这些点处计算函数的导数,以获取有关函数斜率的提示。 然后,我们使用得到的点和斜率来绘制函数形状的近似值。

“Evil function” doodle
But sometimes functions can get evil… (image by author)
但是有时函数会变得邪恶……(作者提供的图片)

In the real world, sometimes we have lots of functions and even strange things like functions that output multiple values at the same time or functions that require multiple inputs to give us values. And if you are the kind of person that finds high school functions dreadful, imagine how dreadful these multi-number functions can be — even for skilled adults! Principal component analysis is a technique that takes the output for all these functions and gives us close approximations using much less of these numbers. Fewer numbers often mean memorizing less data, smaller and cheaper ways to store them, and less of that “being confused” feeling we get when we see so many numbers we don’t even know where to start.

在现实世界中,有时我们有很多功能,甚至有些奇怪的事情,例如同时输出多个值的功能或需要多个输入才能为我们提供值的功能。 而且,如果您是那种对高中职能感到恐惧的人,请想象一下,即使对于熟练的成年人来说,这些多重数字功能可能有多么可怕! 主成分分析是一种获取所有这些函数的输出并使用更少的这些数字为我们提供近似值的技术。 数字越少,通常意味着要记住的数据越少,存储数据的方式越小越便宜,并且当我们看到如此多的数字甚至不知道从哪里开始时,就会感到越来越“困惑”。

3) First-year University student

3)大学一年级学生

During your studies, you’ve learned all about linear algebra, statistics, and probability. You’ve dealt with a few of these “real world” functions that input and output multiple values and learned that they work with these things called vectors and matrices. You’ve also learned all about random variables, sample values, means, variances, distributions, covariances, correlation, and all the rest of the statistics techno-babble.

在学习期间,您已经了解了线性代数,统计量和概率的全部知识。 您已经处理了其中一些输入和输出多个值的“真实世界”函数,并了解到它们可用于矢量和矩阵。 您还了解了有关随机变量,样本值,均值,方差,分布,协方差,相关性以及所有其他统计技术tech语的知识。

Image for post
youtoart.com)youtoart.com )

Principal Component Analysis relies on a technique called “Singular value decomposition” (SVD for short). For now let’s treat it as what’s called a “black box” in computer science: an unknown function that gives us some desired output once we feed it with the required input. If we collect lots of data (from observations, experimentation, monitoring etc…) and store it in matrix form, we can input this matrix to our SVD function and get a matrix of smaller dimensions that allows us to represent our data with fewer values.

主成分分析依赖于称为“奇异值分解”(简称SVD)的技术。 现在,让我们将其视为计算机科学中的“黑匣子”:一个未知函数,一旦我们将其输入所需的输入,便会提供一些所需的输出。 如果我们收集大量数据(来自观察,实验,监视等)并将其以矩阵形式存储,则可以将此矩阵输入到SVD函数中,并获得较小尺寸的矩阵,该矩阵可以用更少的值表示数据。

While for now, it might make sense to keep the SVD as a black box, it could be interesting to investigate the input and output matrices further. And it turns out the SVD gives us extra-meaningful outputs for a special kind of matrix called a “covariance matrix”. Being an undergrad student, you’ve already dealt with covariance matrices, but you might be thinking: “What does my data have to do with them?”

虽然目前将SVD保留为黑匣子可能是有意义的,但进一步研究输入和输出矩阵可能会很有趣。 事实证明,SVD为我们提供了一种称为“协方差矩阵”的特殊矩阵,对我们来说意义非凡。 作为本科生,您已经处理过协方差矩阵,但是您可能会想:“我的数据与它们有什么关系?”

Image for post
Left - Eugenio Beltrami; Right - Camille Jordan; Two late-19th century mathematicians who independently discovered SVD (images: public domain)
左-Eugenio Beltrami; 右-卡米尔·乔丹(Camille Jordan); 两位19世纪末的数学家独立发现了SVD(图片:公共领域)

Our data actually has quite a lot to do with covariance matrices. If we sort our data into distinct categories, we can group related values in a vector representing that category. Each one of these vectors can also be seen as a random variable, containing n samples (which makes n the length of the vector). If we concatenate these vectors together, we can form a matrix X of m random variables, or a random vector with m scalars, holding n measurements of that vector.

实际上,我们的数据与协方差矩阵有很大关系。 如果我们将数据分类为不同的类别,则可以将相关值分组在代表该类别的向量中。 这些向量中的每一个也可以看作是一个随机变量,包含n个样本(使向量的长度为n)。 如果将这些向量连接在一起,我们可以形成m个随机变量的矩阵X,或者形成一个m个标量的随机向量,并对该向量进行n次测量。

The matrix of concatenated vectors, representing our data, can then be used to calculate the covariance matrix for our random vector. This matrix is then provided as an input for our SVD, which provides us with a unitary matrix as an output.

表示我们数据的级联向量矩阵可用于计算随机向量的协方差矩阵。 然后将此矩阵作为我们SVD的输入,后者为我们提供了一个matrix矩阵作为输出。

Without diving in too deep into unitary matrices, the output matrix has a neat property: we can pick the first k column vectors from the matrix to generate a new matrix U, and multiply our original matrix X with the matrix U to obtain a lower-dimensional matrix. It turns out this matrix is the “best” representation of the data stored in X when mapped to a lower dimension.

在不深入探讨unit矩阵的情况下,输出矩阵具有简洁的属性:我们可以从矩阵中选取前k个列向量来生成新矩阵U,然后将原始矩阵X与矩阵U相乘以获得较低的-尺寸矩阵。 事实证明,当映射到较低维度时,此矩阵是存储在X中的数据的“最佳”表示。

4) Bachelor of Science

4)理学学士

Graduating with an academic degree, you now have the mathematical background needed to properly explain Principal component analysis. But just writing down the math won’t be helpful if we don’t understand the intuition behind it or the problem we are trying to solve.

通过学习,您现在拥有正确解释主成分分析所需的数学背景。 但是,如果我们不了解其背后的直觉或我们试图解决的问题,则仅写下数学将无济于事。

The problem PCA tries to solve is what we nickname the “curse of dimensionality”: When we try to train machine learning algorithms, the rule of thumb is that the more data we collect, the better our predictions become. But with every new data feature (implying we now have an additional random vector) the dimension of the vector space spanned by the feature vectors increases by one. The larger our vector space, the longer it takes to train our learning algorithms, and the higher the chance some of the data might be redundant.

PCA试图解决的问题是我们昵称的“维数诅咒”:当我们尝试训练机器学习算法时,经验法则是,我们收集的数据越多,我们的预测就越好。 但是,对于每个新的数据特征(这意味着我们现在有了一个附加的随机向量),特征向量跨越的向量空间的维数将增加一。 向量空间越大,训练我们的学习算法所需的时间就越长,并且某些数据可能是多余的机会也就越大。

Ilustration of a neural network
towardsdatascience.com)toondatascience.com )

To solve this, researchers and mathematicians have constantly been trying to find techniques that perform “dimensionality reduction”: embedding our vector space in another space with a lower dimension. The inherent problem in dimensionality reductions is that for every decrease in the dimension of the vector space, we essentially discard one of the random vectors spanning the original vector space. That is because the basis for the new space has one vector less, making one of our random vectors a linear combination of the others, and therefore redundant in training our algorithm.

为了解决这个问题,研究人员和数学家一直在努力寻找执行“降维”的技术:将向量空间嵌入到另一个具有较小维度的空间中。 降维的固有问题是,向量空间尺寸的每减小一次,我们实际上就丢弃了跨越原始向量空间的随机向量之一。 那是因为新空间的基础少了一个向量,使我们的随机向量之一成为其他向量的线性组合,因此在训练我们的算法时是多余的。

Of course, we do have naïve techniques for dimensionality reduction (like just discarding random vectors and checking which lost vector has the smallest effect on the accuracy of the algorithm). But the question asked is “How can I lose the least information when performing dimensionality reduction?” And it turns out the answer is “By using PCA”.

当然,我们确实有用于降维的幼稚技术(就像只是丢弃随机向量并检查哪个丢失的向量对算法的精度影响最小)。 但是问的问题是“执行降维时如何丢失最少的信息?” 事实证明,答案是“通过使用PCA”。

Formulating this mathematically, if we represent our data in matrix form as we did before, with n random variables having m measurements each, we represent each sample as the row xi. We are therefore searching for a linear map T: n → k that minimizes the Euclidian distances between xi and T(xi)

用数学公式表示,如果我们像以前一样以矩阵形式表示数据,则n个随机变量各具有m个测量值,我们将每个样本表示为xi行。 因此,我们正在寻找一个线性映射T: ℝnℝk ,它使xi与T(xi)之间的欧几里得距离最小

PCA problem formulation
What are we trying to do? To minimize information loss during dimensionality reduction
我们要做什么? 在降维过程中使信息丢失最小化

The intuition behind this formulation is that if we represent our image vectors as n-dimensional vectors in the original vector space, the less they have moved from their original positions implies the smaller the loss of information. Why does distance imply loss of information? Because the closer a representation of a vector is to its original location, the more accurate the representation.

这种表述的直觉是,如果我们将图像矢量表示为原始矢量空间中的n维矢量,则它们从其原始位置移出的次数越少,则意味着信息损失越小。 为什么距离意味着信息丢失? 因为矢量的表示越接近其原始位置,表示就越准确。

Two projections of a vector to a lower dimensional space
We can say T1 is a more accurate projection that T2 because it is closer to the original vector Xi (image by author)
我们可以说T1比T2更精确,因为它更接近原始向量Xi(作者提供的图像)

How do we find that linear map? It turns out that the map is provided by the Singular value decomposition! Recall that if we invoke SVD we obtain a unitary matrix U that satisfies

我们如何找到线性图? 事实证明,映射是由奇异值分解提供的! 回想一下,如果调用SVD,我们将获得一个满足的matrix矩阵U

Image for post

Due to the isomorphism between the linear map space and the matrix space, we can see the matrix U as representing a linear map from n to n:

由于线性图空间和矩阵空间之间的同构,我们可以看到矩阵U表示从ℝnℝn的线性图:

Image for post
The vector space of linear maps from Rn to Rk is isomorphic to the matrix space R(n x k)
从Rn到Rk的线性映射的向量空间与矩阵空间R(nxk)同构

How exactly does SVD provide us with the function U? SVD is essentially a generalization of an important theorem in linear algebra called the Spectral theorem. While the spectral theorem can only be applied to normal matrices (satisfying MM* = M*M, where M* is the complex conjugate of M), SVD generalizes that result to any arbitrary matrix M, by performing a matrix decomposition into three matrices: SVD(M) = U*Σ*V.

SVD如何为我们提供函数U? SVD本质上是线性代数中一个重要定理的泛化,称为谱定理。 虽然频谱定理仅适用于正常矩阵(满足MM * = M * M,其中M *是M的复共轭),但SVD通过将矩阵分解为三个矩阵将结果归纳为任意矩阵M: SVD(M)= U *Σ* V。

According to the Spectral theorem, U is a unitary matrix whose row vectors are an orthonormal basis for Rn, with each row vector spanning an eigenspace orthogonal to the other eigenspaces. If we denote that basis as B = {u1, u2 … un}, And because U diagonalizes the matrix M, it can also be viewed as a transformation matrix between the standard basis and the basis B.

根据谱定理,U是一个ary矩阵,其行向量是Rn的正交基础,每个行向量都跨越与其他本征空间正交的本征空间。 如果我们将该基准表示为B = {u1,u2…un},并且由于U将矩阵M对角化,则它也可以看作是标准基准和基准B之间的转换矩阵。

Image for post
Illustration of a spectral decomposition of an arbitrary linear map into 3 orthogonal projections. Each of which project Xi onto a matching eigenspace, nested within R3 (image by author)
将任意线性图谱分解为3个正交投影的插图。 每个Xi将其投影到匹配的本征空间中,嵌套在R3中(作者提供图片)

Of course, any subset Bk = {u1, u2 … uk} is an orthonormal basis for (k). This means that performing the multiplication M * U-1 (similar to multiplying U * M) is isomorphic to a linear map T: n → k. The theorems behind SVD generalizes this result to any given matrix. And while these matrices aren’t necessarily diagonalizable (or even square), the results regarding the unitary matrices still hold true for U and V. All we have left is to perform the multiplication X’ = Cov(X) * U, performing the dimensionality reduction we were searching for. We can now use the reduced matrix X’ for any purpose we’d have used the original matrix X for.

当然,任何子集Bk = {u1,u2…uk}是 (k)的正交基础。 这意味着,在执行乘法M * U-1(类似于乘以U * M)是同构的线性地图T:ℝÑ→ℝķ。 SVD背后的定理将这个结果推广到任何给定的矩阵。 尽管这些矩阵不一定是对角线化的(甚至平方),但有关U和V的the矩阵的结果仍然成立。剩下的就是执行乘法X'= Cov(X)* U,然后执行我们正在寻找的降维。 现在,我们可以将缩减后的矩阵X'用于原始矩阵X所用于的任何目的。

5) Expert data scientist

5)专家数据科学家

As an expert data scientist with a hefty amount of experience in research and development of learning algorithms, you probably already know of PCA and don’t need abridged descriptions of its internal workings and of applying it in practice. I do find however that many times, even when we have been applying an algorithm for a while, we don’t always know the math that makes it work.

作为在学习算法的研究和开发方面拥有大量经验的专家数据科学家,您可能已经了解PCA,不需要对其内部工作以及在实践中应用它的简要说明。 但是,我确实发现很多次,即使我们已经应用了一段时间的算法,我们也不总是知道使它起作用的数学方法。

If you’ve followed along with the previous explanations, we still have two questions left: Why does the matrix U represent the linear map T which minimizes the loss of information? And how does SVD provide us with such a matrix U?

如果您按照前面的说明进行操作,那么我们还有两个问题:为什么矩阵U代表线性映射T,从而将信息损失降至最低? SVD如何为我们提供这样的矩阵U?

The answer to the first question lies in an important result in linear algebra called the Principal Axis theorem. It states that every ellipsoid (or hyper-ellipsoid equivalent) shape in analytic geometry can be represented as a quadratic form Q: n x n → of the shape Q(x)=x’Ax, (here x’ denotes the transpose of x) such that A is diagonalizable and has a spectral decomposition into mutually orthogonal eigenspaces. The eigenvectors obtained form an orthonormal basis of n, and have the property of matching the axes of the hyper-ellipsoid. That’s neat!

第一个问题的答案在于线性代数的一个重要结果,称为主轴定理。 它指出解析几何中的每个椭球(或超椭球等效)形状都可以表示为二次形式Q: ℝn xℝn形状Q(x)= x'Ax,(此处x'表示转置的x),使得A可对角化并在光谱上分解为相互正交的本征空间。 形式ℝn的正交基所获得的特征向量,并具有超椭圆体的轴相匹配的属性。 那很整齐!

Principal axis theorem shows us the eigenvectors of A coincide with the axes of the ellipsoid defined by Q
researchgate.net)researchgate.net )

The Principal Axis theorem shows a fundamental connection between linear algebra and analytic geometry. When plotting our matrix X and visualizing the individual values, we can plot a (roughly) hyper-ellipsoid shape containing our data points. We can then plot a regression line for each dimension of our data, trying to minimize the distance to the orthogonal projection of the data points onto the regression line. Due to the properties of hyper-ellipsoids, these lines are exactly their axes. Therefore, the regression lines correspond to the spanning eigenvectors obtained from the spectral decomposition of the quadratic form Q.

主轴定理显示了线性代数与解析几何之间的基本联系。 在绘制矩阵X并可视化各个值时,我们可以绘制一个包含数据点的(大致)超椭圆形。 然后,我们可以为数据的每个维度绘制一条回归线,以尽量减少到数据点在回归线上的正交投影的距离。 由于超椭圆体的特性,这些线正好是它们的轴。 因此,回归线对应于从二次形式Q的光谱分解中获得的跨越特征向量。

Relationship and tradeoff between a vector, a possible orthogonal projection and the projection’s orthogonal complement
The relationship and tradeoff between a vector, a possible orthogonal projection and the projection’s orthogonal complement (image by author)
向量,可能的正交投影和投影的正交补码之间的关系和权衡(作者提供的图像)

Reviewing a diagram similar to the one used when formulating the problem, notice the tradeoff between the length of an orthogonal projection to the length of its orthogonal complement: The smaller the orthogonal complement, the larger the orthogonal projection. Even though X is written in matrix form, we must not forget it is a random vector, and as such its components exhibit variance between their measurements. Since P(x) is a projection, it inevitably loses some information about x, which accumulates to a loss of variance.

查看与解决问题时使用的示意图类似的图,请注意正交投影的长度与其正交补角的长度之间的折衷:正交补角越小,正交投影就越大。 即使X以矩阵形式编写,我们也不能忘记它是一个随机向量,因此它的分量在它们的测量之间表现出方差。 由于P(x)是一个投影,它不可避免地会丢失一些有关x的信息,从而累积了方差损失。

In order to minimize the variance lost (i.e. retain the maximum possible variance) across the orthogonal projections, we must minimize the length of the orthogonal complements. Since the regression lines coinciding with our data’s principal axes minimizes the orthogonal complements of the vectors representing our measurements, they also maximize the amount of variance kept from the original vectors. This is exactly why SVD “minimizes the loss of information”: because it maximizes the variance kept.

为了最小化正交投影上损失的方差(即,保持最大可能的方差),我们必须最小化正交互补序列的长度。 由于与我们数据主轴一致的回归线使代表我们测量值的矢量的正交补数最小,因此它们也使与原始矢量保持的方差量最大化。 这就是SVD“最大程度地减少信息丢失”的原因:因为它最大程度地保持了差异。

Now that we’ve discussed the theory behind SVD, how does it do its magic? SVD takes a matrix X as it’s input, and decomposes it into three matrices: U, Σ, and V, that satisfy X = U*Σ*V. Since SVD is a decomposition rather than a theorem, let’s perform it step by step on the given matrix X.

既然我们已经讨论了SVD背后的理论,那么它如何发挥作用呢? SVD将矩阵X作为输入,并将其分解为三个矩阵:U,Σ和V,它们满足X = U *Σ* V。 由于SVD是分解而不是定理,因此让我们在给定的矩阵X上逐步执行它。

First, we’re going to introduce a new term: a Singular vector of a matrix is any vector v that satisfies the equation

首先,我们将引入一个新术语:矩阵的奇异向量是满足等式的任何向量v

Image for post

Since X ∈ (m x n), we have no guarantees as to its dimensions or its contents. X might not have eigenvalues (since eigenvalues are only defined for square matrices), but it always has at least ρ different singular vectors (with ρ(X) being the rank of X), each with a matching singular value.

由于X∈ℝ (mxn),因此我们无法保证其尺寸或内容。 X可能没有特征值(因为特征值仅针对平方矩阵定义),但它始终至少具有ρ个不同的奇异矢量(其中ρ(X)是X的秩),每个奇异矢量都具有匹配的奇异值。

We can then use X to construct a symmetric matrix

然后我们可以使用X来构造对称矩阵

Image for post

Since every transposed matrix is also an adjoint matrix, the singular values of XT are the complex conjugates of the singular values of X. This makes each eigenvalue of XT*X the square of the matching singular value of X:

由于每个转置矩阵也是一个伴随矩阵,因此XT的奇异值是X的奇异值的复共轭。这使XT * X的每个特征值成为X的匹配奇异值的平方:

Image for post
eigenvalues of XT*X are the squares of singular values of X
XT * X的特征值是X的奇异值的平方

Because XT*X is symmetric, it is also normal. Therefore, XT*X is orthogonally diagonalizable: There exists a diagonal matrix Σ2 which can be written as a product of three matrices:

由于XT * X是对称的,因此也是正常的。 因此,XT * X可正交对角化:存在一个对角矩阵Σ2,可以将其写为三个矩阵的乘积:

Image for post
Orthogonal diagonalization of XT*X
XT * X的正交对角化

where V is an orthonormal matrix made of eigenvectors of XTX. We will mark these vectors as

其中V是由XTX特征向量组成的正交矩阵。 我们将这些向量标记为

Image for post
n, obtained using orthogonal diagonalization of XT*Xℝn的标准正交基,使用XT的正交对角化获得* X

We now construct a new group of vectors

现在,我们构建一组新的向量

Image for post

where we define each member as

我们将每个成员定义为

Image for post

Notice that because Bv is an orthonormal group, for every 1≤i≤j≤n

注意,由于Bv是一个正交群,所以每1≤i≤j≤n

Image for post

Therefore, we can prove that:

因此,我们可以证明:

Image for post
Key equation #1: Because Bv is an orthonormal group, Bu is also an orthonormal group
关键方程式1:因为Bv是一个正交群,所以Bu也是一个正交群

In addition, ui is also an eigenvector of X*XT: This is because

另外,ui也是X * XT的特征向量:这是因为

Image for post
Key equation #2: Bu consists of eigenvectors of X*XT
关键方程式2:Bu由X * XT的特征向量组成

We can now complete the proof by expressing the relationship between Bu and Bv in matrix form:

现在,我们可以通过以矩阵形式表示Bu和Bv之间的关系来完成证明:

Image for post
The relationship between ui and vi, expressed in matrix form
ui和vi之间的关系,以矩阵形式表示

Then by standard matrix multiplication: U*Σ=X*V, and it immediately follows that

然后通过标准矩阵乘法:U *Σ= X * V,它紧随其后

Image for post

The result we just achieved is impressive, but remember we constructed Bu as a group of n vectors using the vectors of Bv, meaning U∈R(m×n) while Σ∈ R(n×n). While this is a valid multiplication, U is not a square matrix and therefore cannot be a unitary matrix. To solve this, we “pad” the matrix Σ with zeros to achieve an m×n shape and extend U to an m×m shape using the Gram-Schmidt process.

我们刚刚获得的结果令人印象深刻,但请记住,我们使用Bv的向量将Bu构造为n个向量的组,这意味着U∈R(m×n)而Σ∈R(n×n)。 虽然这是一个有效的乘法,但U不是方矩阵,因此不能是a矩阵。 为了解决这个问题,我们用零“填充”矩阵Σ以得到m×n形状,并使用Gram-Schmidt过程将U扩展为m×m形状。

Now that we’ve finished the math part (phew…), we can start drawing some neat connections. First, while SVD can technically decompose any matrix, and we could just feed in the raw data matrix X, the Principal Axis theorem only works for diagonalizable matrices.

现在我们已经完成了数学部分((…),我们可以开始绘制一些整洁的连接了。 首先,虽然SVD可以从技术上分解任何矩阵,并且我们可以只输入原始数据矩阵X,但主轴定理仅适用于可对角矩阵。

Second, the Principal axis theorem maximizes the retained variance in our matrix by performing an orthogonal projection onto the matrix’s eigenvectors. But who said our matrix captured a good amount of variance in the first place?

其次,主轴定理通过对矩阵的特征向量进行正交投影,使矩阵中的保留方差最大化。 但是谁说我们的矩阵首先捕获了大量的方差呢?

To answer these questions, and bring this article to an end, we will first restate that capturing variance between random variables is done by using a covariance matrix. And because the covariance matrix is symmetric and positive semi-definite, it is orthogonally diagonalizable AND has a spectral decomposition. We can then rewrite the formula for the covariance matrix using this neat equation, obtained by applying SVD to the matrices multiplied to form the covariance matrix:

为了回答这些问题并结束本文,我们将首先重申使用协方差矩阵来捕获随机变量之间的方差。 并且因为协方差矩阵是对称且正半定的,所以它可以正交对角化并且具有频谱分解。 然后,我们可以使用这个整齐的方程式重写协方差矩阵的公式,该方程是通过将SVD应用于乘以形成协方差矩阵的矩阵而获得的:

Image for post
Key equation #3: Notice this is very similar to the orthogonal diagonalization of XT*X
关键方程式3:请注意,这与XT * X的正交对角线化非常相似

This is exactly why we said that SVD gives us extra-meaningful outputs when being applied to the covariance matrix of X:

这就是为什么我们说SVD应用于X的协方差矩阵时会给我们额外的意义:

a) Not only is the Singular Value decomposition of Cov(X) is identical to its Spectral decomposition, it diagonalizes it!

a)Cov(X)的奇异值分解不仅与其频谱分解相同,而且对角化!

b) The diagonalizing matrix V is an orthonormal matrix made of unitary eigenvectors of Cov(X), which is used to perform the Principal component analysis of X!

b)对角化矩阵V是由Cov(X)的单一特征向量组成的正交矩阵,用于执行X!的主成分分析!

c) Using Cov(X) captures the maximum amount of variance in our data, and by projecting it onto the basis eigenvectors associated with the k-largest eigenvalues of Cov(X) we lose the smallest possible amount of variance for reducing our data by (n-k) dimensions

c)使用Cov(X)捕获数据中的最大方差量,并将其投影到与Cov(X)的k个最大特征值相关的基础特征向量上,我们损失了最小的方差量来减少我们的数据(nk)尺寸

d) The Principal Axis theorem ensures that we minimize the projection error from n→k when performing PCA using Cov(X) and V.

d)主轴定理确保当使用Cov(X)和V执行PCA时,使从ℝnℝk引起的投影误差最小。

翻译自: https://medium.com/@yonatanalon/principal-component-analysis-now-explained-in-your-own-terms-6f7a4af1da8

主成分分析具体解释

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/242105.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

MongoDB介绍

一、MongoDB介绍 1.1 mongoDB介绍 MongoDB 是由C语言编写的,是一个基于分布式文件存储的开源数据库系统。 在高负载的情况下,添加更多的节点,可以保证服务器性能。 MongoDB 旨在为WEB应用提供可扩展的高性能数据存储解决方案。 MongoDB …

Cross-Drone Transformer Network for Robust Single Object Tracking论文阅读笔记

Cross-Drone Transformer Network for Robust Single Object Tracking论文阅读笔记 Abstract 无人机在各种应用中得到了广泛使用,例如航拍和军事安全,这得益于它们与固定摄像机相比的高机动性和广阔视野。多无人机追踪系统可以通过从不同视角收集互补的…

【5G PHY】NR参考信号功率和小区总传输功率的计算

博主未授权任何人或组织机构转载博主任何原创文章,感谢各位对原创的支持! 博主链接 本人就职于国际知名终端厂商,负责modem芯片研发。 在5G早期负责终端数据业务层、核心网相关的开发工作,目前牵头6G算力网络技术标准研究。 博客…

2016年第五届数学建模国际赛小美赛A题臭氧消耗预测解题全过程文档及程序

2016年第五届数学建模国际赛小美赛 A题 臭氧消耗预测 原题再现: 臭氧消耗包括自1970年代后期以来观察到的若干现象:地球平流层(臭氧层)臭氧总量稳步下降,以及地球极地附近平流层臭氧(称为臭氧空洞&#x…

数据结构和算法-二叉排序树(定义 查找 插入 删除 时间复杂度)

文章目录 二叉排序树总览二叉排序树的定义二叉排序树的查找二叉排序树的插入二叉排序树的构造二叉排序树的删除删除的是叶子节点删除的是只有左子树或者只有右子树的节点删除的是有左子树和右子树的节点 查找效率分析查找成功查找失败 小结 二叉排序树 总览 二叉排序树的定义 …

【LeetCode:1954. 收集足够苹果的最小花园周长 | 等差数列 + 公式推导】

🚀 算法题 🚀 🌲 算法刷题专栏 | 面试必备算法 | 面试高频算法 🍀 🌲 越难的东西,越要努力坚持,因为它具有很高的价值,算法就是这样✨ 🌲 作者简介:硕风和炜,…

C语言—每日选择题—Day60

指针相关博客 打响指针的第一枪:指针家族-CSDN博客 深入理解:指针变量的解引用 与 加法运算-CSDN博客 第一题 1. 下列for循环的循环体执行次数为() for(int i 10, j 1; i j 0; i, --j) A:0 B:1 C&#…

蓝桥小课堂-平方和【算法赛】

问题描述 蓝桥小课堂开课啦! 平方和公式是一种用于计算连续整数的平方和的数学公式。它可以帮助我们快速求解从 1 到 n 的整数的平方和,其中 n 是一个正整数。 平方和公式的表达式如下: 这个公式可以简化计算过程,避免逐个计算…

2024年【制冷与空调设备运行操作】免费试题及制冷与空调设备运行操作试题及解析

题库来源:安全生产模拟考试一点通公众号小程序 制冷与空调设备运行操作免费试题根据新制冷与空调设备运行操作考试大纲要求,安全生产模拟考试一点通将制冷与空调设备运行操作模拟考试试题进行汇编,组成一套制冷与空调设备运行操作全真模拟考…

【FPGA】分享一些FPGA协同MATLAB开发的书籍

在做FPGA工程师的这些年,买过好多书,也看过好多书,分享一下。 后续会慢慢的补充书评。 【FPGA】分享一些FPGA入门学习的书籍【FPGA】分享一些FPGA协同MATLAB开发的书籍 【FPGA】分享一些FPGA视频图像处理相关的书籍 【FPGA】分享一些FPGA高速…

python时间处理方法和模块

在 Python 中,有一些内置的模块和库,可以帮助我们处理日期和时间的表示、计算和转换。 1. 时间模块(time) Python 的 time 模块提供了一系列函数来处理时间相关的操作。通过这个模块,可以获取当前时间、睡眠指定时间…

【设计模式】RBAC 模型详解

其他系列文章导航 Java基础合集数据结构与算法合集 设计模式合集 多线程合集 分布式合集 ES合集 文章目录 其他系列文章导航 文章目录 前言 一、什么是 RBAC 呢? 二、RBAC 的组成 三、RBAC 的优缺点 3.1 优点: 3.2 缺点: 四、RBAC 的…

如何用Python画坤坤?

前言 最近闲得慌,突然想起坤坤了,那就画一个吧。 一、Python画坤坤 坤坤大家都熟悉不过了,也就是蔡徐坤。 代码: from turtle import * from math import * #高级椭圆参数方程(颜色),sita为…

MySQL报错:1366 - Incorrect integer value: ‘xx‘ for column ‘xx‘ at row 1的解决方法

我在插入表数据时遇到了1366报错,报错内容:1366 - Incorrect integer value: Cindy for column name at row 1,下面我演示解决方法。 根据上图,原因是Cindy’对应的name字段数据类型不正确。我们在左侧找到该字段所在的grade_6表&…

【图文教程】windows 下 MongoDB 介绍下载安装配置

文章目录 介绍MySQL 之间的区别和适用场景差异数据模型:查询语言:可扩展性:数据一致性: 下载安装环境变量配置 介绍 MongoDB 是一种开源的、面向文档的 NoSQL 数据库管理系统。它使用灵活的文档模型来存储数据,这意味…

HarmonyOS - 基础组件绘制

文章目录 所有组件开发 tipsBlankTextImageTextInputButtonLoadingProgress 本文改编自&#xff1a;<HarmonyOS第一课>从简单的页面开始 https://developer.huawei.com/consumer/cn/training/course/slightMooc/C101667360160710997 所有组件 在 macOS 上&#xff0c;组…

计算机网络——数据链路层-点对点协议(组成部分、PPP帧格式、透明传输、差错检测、工作状态)

目录 介绍 组成部分 PPP帧格式 透明传输 字节填充法 比特填充法 差错检测 工作状态 本篇我们介绍点对点协议PPP 介绍 点对点协议PPP&#xff08;Point-to-Point Protocol&#xff09;是目前使用最广泛的点对点数据链路层协议。 请大家想想看&#xff1a;一般的英特…

算法模板之队列图文详解

&#x1f308;个人主页&#xff1a;聆风吟 &#x1f525;系列专栏&#xff1a;算法模板、数据结构 &#x1f516;少年有梦不应止于心动&#xff0c;更要付诸行动。 文章目录 &#x1f4cb;前言一. ⛳️模拟队列1.1 &#x1f514;用数组模拟实现队列1.1.1 &#x1f47b;队列的定…

Kafka集群架构服务端核心概念

目录 Kafka集群选举 controller选举机制 Leader partition选举 leader partition自平衡 partition故障恢复机制 follower故障 leader故障 HW一致性保障 HW同步过程 Epoch Kafka集群选举 1. 在多个broker中, 需要选举出一个broker, 担任controller. 由controller来管理…

Hadoop入门学习笔记——三、使用HDFS文件系统

视频课程地址&#xff1a;https://www.bilibili.com/video/BV1WY4y197g7 课程资料链接&#xff1a;https://pan.baidu.com/s/15KpnWeKpvExpKmOC8xjmtQ?pwd5ay8 Hadoop入门学习笔记&#xff08;汇总&#xff09; 目录 三、使用HDFS文件系统3.1. 使用命令操作HDFS文件系统3.1.…