蝙蝠侠遥控器pcb
背景 (Background)
Wait! Isn’t the above equation different from what we found last time? Yup, very different but still looks exactly the same or maybe a bit better. Just in case you are wondering what I am talking about, please refer Part I of this series. Many ideas stated here are derived in that article and explained very clearly. The main ones being: why circles become like squares, how we can look at it as an intersection of trenches and how we can build our own graphs and equations by intersecting those trenches.
等待! 上面的方程式与我们上次发现的方程式有区别吗? 是的,非常不同,但看起来还是一样的,甚至可能更好。 万一您想知道我在说什么,请参阅本系列的第一部分。 该文章中提到的许多想法都是从该文章中得出的,并且解释得很清楚。 主要的是:为什么圆变得像正方形,我们如何将其视为沟槽的交点,以及如何通过将这些沟槽相交来构建自己的图形和方程式。
挑战 (The Challenge)
Working with trenches isn’t a very joyful experience, is it? Honestly? They have two walls and we generally use only one. The other one just lingers around and sometimes creates unwanted regions. Remember, we had to trim our Batman for the same reason. Next, we were able to perform only intersection and negation(flipping the sign of the power) of trenches but their union is still a bit challenging yet very useful. With great power comes the great computational difficulty. It is not common to raise our inputs to very high powers and is not very efficient computationally either. Thus, the scope of what we derived and obtained earlier was a bit limited by our designing skills.
使用战es不是很快乐的经历,不是吗? 老实说 他们有两堵墙,我们通常只使用一堵。 另一个只是徘徊,有时会创建不需要的区域。 记住,出于同样的原因,我们不得不修剪蝙蝠侠。 接下来,我们只能执行沟槽的相交和求反(翻转功率的符号),但是它们的结合仍然有些挑战,但非常有用。 强大的功能带来了巨大的计算难度。 将输入提高到非常高的功率并不常见,而且计算效率也不高。 因此,我们早期获得和获得的内容的范围受到我们设计技能的限制。
The universal set of mathematics is infinite and we would never be able to express it as the union of our finite ideas. So let’s start finding solutions to our challenges!
数学的通用集是无限的,我们永远无法将其表达为有限思想的结合。 因此,让我们开始寻找应对挑战的解决方案!
从战to到边界 (From Trenches to Boundaries)
Remember, whenever the powers become too large to control, logarithms and exponents come to the rescue. The fundamental reason for having trenches was that when large even powers were used then two walls would form. One at y-f(x)=1(which we generally use) and the other at y-f(x)=-1(which we generally discard). Thus we have to make some changes to get only one wall per trench(that makes it just a wall). We can do this pretty easily. Just replace x^(2n) with e^(nx). The main reason why everything worked was that for absolute input values greater than 1 we would have function increasing above 1 very fast and for ones less than 1 we would have values near zero. In the case of e^(nx) for positive x, we have output values going above 1 very fast and for negative x, they are near zero. The first challenge is solved! Exponents are commonly used and have fast implementations. That’s a nice property to have, ain’t it?
请记住,每当幂数变得太大而无法控制时,对数和指数就可以解救。 具有沟槽的根本原因是,当使用大的偶数功率时,会形成两堵墙。 一个在yf(x)= 1(我们通常使用),另一个在yf(x)=-1(我们通常丢弃)。 因此,我们必须进行一些更改,以使每个沟槽只有一堵墙(这使其成为一堵墙)。 我们可以很容易地做到这一点。 只需将x ^(2n)替换为e ^(nx)。 一切正常的主要原因是,对于大于1的绝对输入值,我们将使函数快速增加到1以上;对于小于1的绝对输入,我们将具有接近零的值。 对于正x的e ^(nx),我们的输出值非常快地超过1,而对于负x,它们的输出值接近零。 第一个挑战已经解决! 指数是常用的并且具有快速实现。 那是一个不错的财产,不是吗?
A wall is always better than a trench, especially when no one has to pay for it (at least not computationally in our case).
墙总是比沟槽更好,尤其是在没有人需要为此付费的情况下(至少在我们的案例中不是这样)。
补语,交集和联合 (Complement, Intersection and Union)
Once we have this right tool, we can understand what all great things we can do with it. The boundary we just defined has near-zero values for negative x and tends to infinity for positive x. Let’s understand what is a set then. A set here is essentially an inequality like y≤x or x²+y²-1≤0. The boundary of these sets is what we want to represent. Thus we can multiply with large n and exponentiate both sides to get our boundaries. Thus y-x≤0 will look like e^(50(y-x))≤1, which is akin to (y-x)⁵⁰≤1 (one of our previous trench boundaries). The same logic we saw earlier applies to both cases.
一旦有了这个正确的工具,我们就可以了解使用它可以做的所有伟大的事情。 我们刚刚定义的边界对于负x具有接近零的值,对于正x趋于无穷大。 让我们了解什么是集合。 这里的集合本质上是一个不等式,例如y≤x或x²+y²-1≤0。 这些集合的边界就是我们想要表示的。 因此,我们可以乘以大的n并对两侧求幂以得出边界。 因此yx≤0看起来像e ^(50(yx))≤1,类似于(yx)⁵⁰≤1(我们先前的沟槽边界之一)。 我们在前面看到的相同逻辑适用于两种情况。
Let’s look at the complement of our sets which we can obtain by simply changing the sign of the power as we did last time.
让我们看一下我们的集合的补充,通过像上次那样简单地改变幂的符号就可以获得。
Next, let’s look at our favourite intersection. It’s the same as what we derived earlier as there is nothing different in the logic of the sum of small numbers being less than 1. This can be seen as follows:
接下来,让我们看看我们最喜欢的路口。 它与我们先前导出的结果相同,因为小数之和小于1的逻辑没有什么不同。这可以看到如下:
And finally our newcomer, union. Let’s derive it. By De-Morgan’s law, we know that AUB = (A’∩B’)’. Thus that means the inverse of the sum of the inverse of the sets. Ahhh! It’s just like how you evaluate the resistance of parallel resistors (1/Rt=1/R1+1/R2). Or, for those who are familiar, it is the Harmonic Mean instead of the sum. Let’s see:
最后是我们的新来者,工会。 让我们得出它。 根据德摩根定律,我们知道AUB =(A'∩B')'。 因此,这意味着集合的逆的总和的逆。 啊! 就像评估并联电阻(1 / Rt = 1 / R1 + 1 / R2)的电阻一样。 或者,对于那些熟悉的人,它是谐波均值而不是总和。 让我们来看看:
There is also a very important observation. The property of value tending to zero for points inside the set and to infinite for points outside the set is also applicable to the results of the above set operations. Thus these set operations are also repeatable without doing exponentiation again. This means we can compute more complex operations like AU(B∩(CUD)) by applying the above set operations one by one.
还有一个非常重要的发现。 对于集合内的点趋于零而对于集合外的点趋于无穷的值的属性也适用于上述集合操作的结果。 因此,这些设置操作也是可重复的,而无需再次进行求幂。 这意味着我们可以通过逐个应用上述设置操作来计算更复杂的操作,如AU(B∩(CUD))。
The union of a variety of ideas in algebra and set theory has guided us through the narrow intersection of mathematics and creative art. Summing up all our high powered ideas, I can say that the difference from our goal will soon tend to zero. With one final activity, which I would like to exponentiate on, we would be all set to fit the application in Machine Learning.
代数理论和集合论中各种思想的结合,引导我们度过了数学与创意艺术的狭窄交集。 总结我们所有有力的想法,我可以说与目标的差距很快就会趋于零。 在最后的一项活动(我想对之进行总结)中,我们将全部适应于机器学习中的应用程序。
让我们做点什么! (Let’s Make Something!)
Let’s make the following puzzle piece:
让我们制作以下拼图:
The above equation can be obtained by first creating a square, then taking union with 2 circles (the ones protruding) followed by the intersection with the complement of other 2 circles (the ones removed). Let the 4 edges of the square be represented as A, B, C and D. Each edge is a single line. The circles undergoing union be represented as E and F. And circles being removed be represented as G and H. Thus the above figure is: ((A ∩ B ∩ C ∩ D) U E U F) ∩ G’ ∩ H’.
可以通过首先创建一个正方形,然后与2个圆(突出的圆)进行并集,然后与其他2个圆的补码的相交(去除的圆)获得上式。 假设正方形的4条边分别表示为A,B,C和D。每条边都是一条直线。 经过合并的圆用E和F表示,被除去的圆用G和H表示。因此,上图是:((A∩B∩C∩D)UEUF)∩G'∩H'。
Let’s represent the sets:
让我们代表集合:
- A: e^(50(x-1)) 答:e ^(50(x-1))
- B: e^(-50(x+1)) minus sign to change the direction of the set (to make it face inside the figure) B:e ^(-50(x + 1))减号可更改集合的方向(使其面向图形内部)
- C: e^(50(y-1)) C:e ^(50(y-1))
- D: e^(-50(y+1)) minus sign to change the direction of the set D:e ^(-50(y + 1))减号以更改集合的方向
- E: e^(50((x+1)²+y²-0.25)) E:e ^(50((x + 1)²+y²-0.25))
- F: e^(50((x)²+(y-1)²-0.25)) F:e ^(50((x)²+(y-1)²-0.25))
- G: e^(50((x-1)²+(y)²-0.25)) G:e ^(50((x-1)²+(y)²-0.25))
- H: e^(50((x)²+(y+1)²-0.25)) 高:e ^(50((x)²+(y + 1)²-0.25))
Performing (A ∩ B ∩ C ∩ D) gives: (e^(50(x-1))+e^(-50(x+1))+e^(50(y-1))+e^(-50(y+1)))
执行(A∩B∩C∩D)得到:(e ^(50(x-1))+ e ^(-50(x + 1))+ e ^(50(y-1))+ e ^( -50(y + 1)))
This is followed by taking union with the circles thus giving:
接下来是与圈子合而为一:
((e^(50(x-1))+e^(-50(x+1))+e^(50(y-1))+e^(-50(y+1)))^-1+e^(-50((x+1)²+y²-0.25))+e^(-50((x)²+(y-1)²-0.25)))^-1
(((e ^(50(x-1))+ e ^(-50(x + 1))+ e ^(50(y-1))+ e ^(-50(y + 1)))^- 1 + e ^(-50((x + 1)²+y²-0.25))+ e ^(-50((x + 1)²+(y-1)²-0.25)))^-1
Finally performing intersection with the complement of other 2 circles gives the desired equation shown in the figure above.
最终与其他2个圆的补码执行相交,得出所需的方程式如上图所示。
All the above operations are also compatible with what we had learnt previously thus the square can take back its true form: x⁵⁰+y⁵⁰=1. Giving an alternate equation for the same figure as shown below. Note the first term.
上述所有操作也与我们先前学到的兼容,因此平方可以取回其真实形式:x⁵⁰+y⁵⁰= 1。 如下图所示,为该图给出一个替代方程。 注意第一项。
((x⁵⁰+y⁵⁰)^-1+e^(-50((x+1)²+y²-0.25))+e^(-50((x)²+(y-1)²-0.25)))^-1+e^(-50((x-1)²+(y)²-0.25))+e^(-50((x)²+(y+1)²-0.25))≤1
(((x⁵⁰+y⁵⁰)^-1 + e ^(-50((x + 1)²+y²-0.25))+ e ^(-50((x)²+(y-1)²-0.25))) )^-1 + e ^(-50((x-1)²+(y)²-0.25))+ e ^(-50((x)²+(y + 1)²-0.25))≤1
实际应用 (Real Applications)
Remember, if we can create good looking graphs and their equations then we can also create some pretty interesting functions commonly used in various fields or even make some for ourselves. My background is in Machine Learning therefore I am acquainted with some of these functions used in my field.
请记住,如果我们可以创建漂亮的图形及其方程式,那么我们还可以创建一些非常有趣的函数,这些函数通常用于各个领域,甚至为自己创建一些函数。 我的背景是机器学习,因此我熟悉本领域中使用的一些功能。
派生Log-sum-exp和Softmax (Deriving Log-sum-exp and Softmax)
We all have learnt so much about the max function which takes in multiple numbers and spits out the largest one. In many applications in machine learning, we want the max operation to not only be as close as possible to the actual largest number but to also have some relation with the numbers which are not the largest. The closer these numbers are to the largest, the more they should contribute to the result. This is important because it allows gradients to propagate through non-maximum terms as well during backpropagation or other training algorithms. Not going deep into ML, we can say we need an approximation of the max operation which accounts for all the terms. Such approximation of fixed(hard) decision functions like max, if-else, sorting etc. is called softening.
我们都学到了很多关于max函数的知识,max函数接受多个数字并吐出最大的一个。 在机器学习的许多应用中,我们希望max运算不仅要尽可能接近实际最大数,还要与不是最大数的数具有某种关系。 这些数字离最大值越近,它们对结果的贡献就越大。 这很重要,因为它允许梯度在反向传播或其他训练算法期间通过非最大项传播。 不深入学习ML,可以说我们需要近似max运算,该运算要考虑所有项。 固定(硬)决策函数(例如max,if-else,sort等)的这种近似称为softening 。
What is max essentially? It is the union of the individual functions and the largest one is represented at the output. Note that the largest function in the union operation automatically has smaller ones under it. For 1 dimensional example, max(y=1,y=2,y=3) is y=3. We can also write this as the boundary of Union(y≤1,y≤2,y≤3). The union is y≤3 so the boundary is y=3. Let’s visualize this for some more realistic functions:
本质上是什么? 它是各个功能的并集,最大的代表在输出中。 请注意,并集操作中的最大功能会自动在其下方显示较小的功能。 对于一维示例,max(y = 1,y = 2,y = 3)是y = 3。 我们也可以将其写为Union(y≤1,y≤2,y≤3)的边界。 联合为y≤3,因此边界为y = 3。 让我们将其可视化为一些更实际的功能:
Let the input functions be f(x), g(x) and h(x). We can represent them as y-f(x)≤0, y-g(x)≤0 and y-h(x)≤0. Lets perform the union of these borders to generate our approximation for max function:
令输入函数为f(x),g(x)和h(x)。 我们可以将它们表示为yf(x)≤0,yg(x)≤0和yh(x)≤0。 让我们执行这些边界的并集以生成max函数的近似值:
(e^(-n(y-f(x))+e^(-n(y-g(x))+e^(-n(y-h(x)))^-1≤1
(e ^(-n(yf(x))+ e ^(-n(yg(x))+ e ^(-n(yh(x)))^-1≤1
Taking the points on the border, we get: (e^(-n(y-f(x))+e^(-n(y-g(x))+e^(-n(y-h(x)))^-1=1
取边界上的点,我们得到:(e ^(-n(yf(x))+ e ^(-n(yg(x))+ e ^(-n(yh(x)))^-1 = 1
Rearranging the result to make it a function of x gives:
重新排列结果使其成为x的函数将给出:
y = ln(e^(nf(x))+e^(ng(x))+e^(nh(x)))/n
y = ln(e ^(nf(x))+ e ^(ng(x))+ e ^(nh(x)))/ n
This is essentially the log-sum-exp form. 1/n is generally referred to as the temperature and is denoted as T and is generally taken to be 1. For multiple terms we get: y=ln(sum(e^xᵢ, for all i)). Thus deriving its name log-sum-exp (LSE).
这本质上是log-sum-exp形式。 1 / n通常称为温度,并表示为T,通常取为1。对于多个项,我们得到:y = ln(sum(e ^xᵢ,对于所有i))。 因此派生其名称log-sum-exp(LSE)。
What about all that softening and ML stuff said earlier? Note that the log-sum-exp is always greater than the actual max. This difference accounts for the terms which are second largest, third-largest, and so on …. The output is a mixture of all terms as seen in the above equation but larger terms contribute much more than the smaller ones as we have approximated the max operation. Remember!! larger was the value of n, closer was the circle to a square. Similarly, larger n means bigger terms contribute exponentially more than smaller ones thus making the approximation more accurate. Notice the curved differentiable corners as well, this is another benefit of our approximation very helpful in ML. By the way, what’s the derivative of log-sum-exp? It’s our common friend Softmax(Yes, the classification layer one). I hope now you see how this famous function also derived its name. And yes, I know this is not a Wikipedia article, so I will move on.
前面说过的所有软化和ML东西怎么办? 请注意,log-sum-exp始终大于实际最大值。 这种差异说明了第二大,第三大等等的术语。 如上式所示,输出是所有项的混合,但是较大的项比较小的项贡献更大,因为我们已经近似了最大运算。 记得!! n的值越大,则圆与正方形的距离越近。 同样,较大的n表示较大的项比较小的项在指数上的贡献更大,从而使近似值更精确。 还要注意弯曲的可微角,这是我们在ML中非常有用的近似值的另一个好处。 顺便说一句,log-sum-exp的派生是什么? 这是我们的共同朋友Softmax (是的,分类第一层)。 我希望您现在看到这个著名的函数也是如何派生其名称的。 是的,我知道这不是维基百科的文章,所以我继续。
There is no dearth of applications of the log-sum-exp equation. Many of its properties are discussed on its Wikipedia page. There are also papers using it as an architecture directly like ones using it as universal convex approximator and as universal function approximator. There are many many more applications of this function, hence it has inbuilt efficient implementations in nearly all ML libraries from Numpy to Pytorch and Tensorflow. The upcoming applications are like special cases of this one.
log-sum-exp方程没有任何应用。 它的许多属性在其Wikipedia页面上进行了讨论。 也有论文将其作为体系结构直接用作将其用作通用凸逼近器和通用函数逼近器的论文 。 此功能还有很多应用程序,因此它在从Numpy到Pytorch和Tensorflow的几乎所有ML库中都内置了高效的实现。 即将到来的应用程序就像本例的特殊情况。
导出Soft-Plus激活功能 (Deriving Soft-Plus Activation function)
The soft-plus activation is a type of non-linearity used in neural networks. One common non-linearity is ReLU which takes the form max(0,x) and is very commonly used. Under many conditions as described in this paper, we need to approximate the ReLU function. Not diving deep into activation function and ML details, we can handle this challenge by approximating the ReLU function with the methods we just learnt.
软加激活是神经网络中使用的一种非线性类型。 一种常见的非线性是ReLU,其形式为max(0,x),并且非常常用。 在本文所述的许多条件下,我们需要近似ReLU函数。 我们不会深入研究激活功能和ML细节,而是可以通过使用我们刚刚学习的方法近似ReLU函数来应对这一挑战。
Thus we have to approximate max(0,x). Why not just refer our derivation in log-sum-exp. The two needed components are y≤0 and y-x≤0. This will thus give the union as (e^(-ny)+e^(-n(y-x)))^-1≤1. This will thus give us the equation: y = ln(1+e^(nx))/n. When n is 1, this is referred to as the softplus activation function. Beyond its original paper, this function is also used in other activation functions like swish, mish and soft++.
因此,我们必须近似max(0,x)。 为什么不只在log-sum-exp中引用我们的推导。 所需的两个分量是y≤0和yx≤0。 因此,这将使并集为(e ^(-ny)+ e ^(-n(yx)))^-1≤1。 因此,这将给我们方程:y = ln(1 + e ^(nx))/ n。 当n为1时,这称为softplus激活函数。 除了原始论文外,此功能还用于其他激活功能,例如swish , mish和soft ++ 。
We can even go beyond and create our own variant of softplus and call it leaky-softplus. Essentially an approximation of leaky-ReLU ( max(0.05x, x) ) with the same procedure. The function takes the following form: y = ln(e^(0.05nx)+e^(nx))/n. And according to the common ritual, we make n=1. The result is shown below. Testing and experimentation are left to the reader. ;-)
我们甚至可以超越并创建我们自己的softplus变体,并将其称为leaky-softplus。 用相同的方法本质上是泄漏ReLU(max(0.05x,x))的近似值。 该函数采用以下形式:y = ln(e ^(0.05nx)+ e ^(nx))/ n。 并根据共同的仪式,使n = 1。 结果如下所示。 测试和实验留给读者。 ;-)
推导对数共损失 (Deriving log-cosh loss)
In many regression tasks, a loss function called absolute loss which is essentially the average absolute value of the errors is used. This loss is non-differentiable at zero and also doesn’t have a second derivative for training algorithms using higher-order derivatives. Log-cosh handles these problems very well by approximating it thus looking like mean squared error near zero and like absolute loss away from it. More can be learnt in this article. Thus, we have to approximate |x| which is essentially max(x,-x). We can use the same old trick and get: y=ln(e^nx+e^-nx)/n. We haven’t reached our goal yet. We can now add and subtract ln(2)/n giving us: ln((e^nx+e^-nx)/2)/n+ln(2)/n. The next step is to make n=1 and ignore the constant as it doesn’t affect the training procedure. This gives us: ln((e^x+e^-x)/2) which is ln(cosh(x)). This is our log-cosh loss function.
在许多回归任务中,使用称为绝对损失的损失函数,该函数实质上是误差的平均绝对值。 该损失为零时不可微,对于使用高阶导数的训练算法也没有二阶导数。 Log-cosh通过近似地很好地解决了这些问题,因此看起来像均方误差接近零,并且像绝对误差一样远离它。 更可以在了解到这个文章。 因此,我们必须近似| x |。 本质上是max(x,-x)。 我们可以使用相同的旧技巧来获取:y = ln(e ^ nx + e ^ -nx)/ n。 我们尚未达到目标。 现在我们可以对ln(2)/ n进行加减法运算,从而得出:ln((e ^ nx + e ^ -nx)/ 2)/ n + ln(2)/ n。 下一步是使n = 1并忽略该常数,因为它不会影响训练过程。 这样得出:ln((e ^ x + e ^ -x)/ 2)是ln(cosh(x))。 这是我们的log-cosh损失函数。
结论 (Conclusion)
Simple ideas coming out of sheer curiosity can have a very wide range of applications. There are no ceilings for curiosity and what it can give us. Hope these articles have provided you with new tools and perspectives which you can apply in fields as diverse as science, engineering and art. For curious minds, I would like to leave a good resource to improve your understanding of this idea: Differentiable Set Operations for Algebraic Expressions.
出于好奇而产生的简单想法可能具有广泛的应用范围。 好奇心没有上限,它可以给我们带来什么。 希望这些文章为您提供新的工具和观点,您可以将其应用于科学,工程和艺术等各个领域。 对于好奇的人,我想留下一个很好的资源来增进您对这个想法的理解: 代数表达式的可微集运算 。
挑战 (Challenge)
Make this figure with a single inequation:
用一个不等式使这个数字:
Make this figure with a single inequation without using modulus function:
在不使用模函数的情况下,使该图具有单个不等式:
翻译自: https://towardsdatascience.com/from-circle-to-ml-via-batman-part-ii-699aa5de4a66
蝙蝠侠遥控器pcb
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391119.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!