Talk:Sigmoid function
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Untitled
[edit]Call me crazy, but since when is the Hyperbolic_cosine considered "S shaped"? If this is a typo, I'm not sure what other function it was supposed to be. --65.147.0.105 15:25, 28 May 2004 (UTC)
- Right you are, I don't know what I was thinking. Fixed. (The error function is a proper sigmoid, right?)
Jorge Stolfi 10:24, 31 May 2004 (UTC)
Recent changes to this page
[edit]Isn't it better to redirect this page to the logistic function page? Or restore this page to its former glory? The current page is kinda pathetic.
local extrema?
[edit]Do you really mean "local minimum" and "local maximum"? The example function given clearly doesn't have any local minima or maxima (but it does have a global minimum of 0 and a global maximum of 1) -- Somebody
Perhaps what is meant is that the second derivative (curvature) has a local minimum and maximum? BTW, I do not agree that the function has global extrema, because as I learned them and as the article on them states, they are points in the domain of the function and are always also local extrema. This function has none. The image of the function has supremum of 1 and infimum of 0, though (ie. the asymptotes of this function are y=0 and y=1). 82.103.198.180 10:03, 23 July 2006 (UTC)
Maybe it's for complex argument values? One is led to think of reals only because the plot is 2d, but maybe the text doesn't assume that. Coffee2theorems 18:22, 23 July 2006 (UTC)
Examples
[edit]I'd like to add a gallery of sigmoid-like curves to this article. The hemoglobin example is a nice one. Any others? --HappyCamper 17:22, 30 March 2007 (UTC)
- I am adding an example that includes "most" of the sigmoids having a closed-form equation. Hope that it is of interest and is not seen as COI. This is my very first Wikipedia edit, if I am doing something inappropriate please remove or modify. Dasomath (talk) 20:10, 15 July 2022 (UTC)
Sign
[edit]For the double sigmoid function, do you mean sin?
I also think the double sigmoid function is wrong. What about this one?
or this one:
Image
[edit]I see that someone changed the image size recently in order to avoid resolution problems. Maybe the image should be replaced after all with the almost identical vector image ?--Hagman-de 15:53, 16 June 2007 (UTC)
a slightly more useful definition?
[edit]I've used the sigmoid function on and off, for a long time (about 8 years), and what I use is of course similar to what is presented here, but I would suggest adding two elements into the definition -- a "gain" or "sharpness" factor "k" or "g" -- and a "threshold" or "slider" term that allows the function to be "slid" back and forth across the X-axis:
- Y(t) = 1/(1 + e(k*(X - thr))
The neat thing about this more expanded definition is the following:
- The "gain" at X = "thr", is the derivative of course, but it is 1/4 the value of k (as I remember)
- The curve can be "flipped around" by changing the sign of k; thus the sigmoid can be made to act like a Boolean NOT if "thr" is 0.5 and k is positive,
- You see the failure of "the law of excluded middle" (LoEM) -- no matter how huge the k, the value of the function at X = "thr" = 0.5. This violates the LoEM.
- You can build e.g. an OR gate by adding X1 and X2, subtracting "thr" = 0.5 and then squashing the sum with the sigmoid:
- OR(X1, X2) = 1/(1 + e(-12*(X1 + X2 - 0.5)))
- Given that you can build an OR and a NOT you now can approximate any Boolean function.
- Similarly, in a plane, the value of Z(t) will be 0.5 all along a line (it looks like a folded plane)
- From Y = mX + b,
- Y/b + (m/b)*X = 0
- Z(t) = sig(Y/b + (m/b)*X - thr)
- Two of the above Z(t) but with reversed signs and slightly offset with different thresholds added together make a line, like a mountain range on a map, or a canyon. However, If you put three of these plateaus i.e. "folded sheets" (for a total of just 3 sigmoids) on the X-Y plane and get the signs of their k's right, add them together and pass them through a "second-layer" sigmoid you have a "triangle" that can be shrunk with higher values of k's make a single Matterhorn stick up anywhere on the plane (or make a sink-hole).
- Given that you can make Matterhorns to your heart's content anywhere on the plane, you can add them together and approximate any curve by "bleeding" one into another. This summation proves that sigmoids can be used to approximate any arbitrary curve, much like a 2-D Fourier transform.
Some of this stuff can be found in a book titled:
- Tom M. Mitchell, Machine Learning, WCB-McGraw-Hill, 1997, ISBN 0-07-042807-7
In particular see "Chapter 4: Artificial Neural Networks" where the Boolean abilities of "perceptrons" are defined as well. I happened onto the tricky business of adding three folded planes together to make a "triangle" (and passing them through a second-layer sigmoid) because a neural net showed me this (!). I've not seen it documented anywhere, but I did see the results of it in a journal once. I'm sure someone who knows the literature better could cite the source. Proofs similar to the above are mentioned in Mitchell. This stuff is easy to do in Excel. wvbaileyWvbailey 18:39, 17 June 2007 (UTC)
Another sigmoid?
[edit]I wonder if it would be useful to list the following function among the sigmoids:
I have seen it used as a "hack" when a fast S-shaped function was needed, avoiding the (computer) evaluation of exp(x). Its derivative is flat at 0 and 1, and it is symmetrical with respect to the midpoint (meaning, ). For many purposes it works fine, as long as you don't run outside the range [0,1]. —Preceding unsigned comment added by Pasmao (talk • contribs) 12:44, 27 October 2007 (UTC)
- It would be interesting to add something like this. I fiddled with this notion with respect to what would be required for mother nature to build a squasher for making neuralogical ANDs and ORs, and was able to get to some pretty nice approximations -- as long as you stay within the interval. Somewhere I actually worked out the math for this ... a problem arises because, to be useful, the AND etc needs some "gain" in the middle (i.e. a slope > 1) but the more gain you put in the more difficult the design becomes. For an OR you need a range of -0.25 to +2.25 (i.e. if inputs are "a" and "b" that vary from 0 to 1, add them and squash their sum back to approximately 0 or 1). The first hack starts out with the odd function y = 1*(x-0.5) + 0.5 (just a straight line shifted to the right: yielding (0,0), (1,1) ). This clearly won't work. The trick then is to feedback a certain amount of x2 to give you some "gain", etc, etc. As I remember this works best if it goes through two iterations. I'm working from memory here... bill Wvbailey (talk) 17:18, 13 January 2008 (UTC)
This is nice! Actually you can generalize it
You assure that f(0)=0, f(1)=1 and that f'(0)=f'(1)=0. By playing around with a and b you can get different shapes to suit you Juancentro (talk) 23:36, 18 April 2013 (UTC)
It is already mentioned under smoothstep
example, which doesn't show an explicit formula, but a family of them. These are Hermite polynomials, and , or equivalently more stable (and using less multiplications) , are usual definition of smoothstep in computer graphics (and for example a part of GLSL, and often implemented in hardware with clamping outside of the [0,1] range). Sometimes people use smootherstep
, which is basically smoothstep(smoothstep(x))
. 81.6.34.172 (talk) 16:36, 31 May 2020 (UTC)
Derivative Clarification
[edit]I'm pretty sure that not all sigmoid functions have the derivative:
Perhaps a minor clarification would be in order. —Preceding unsigned comment added by 128.111.110.55 (talk) 02:12, 11 December 2007 (UTC)
This formula is only for tanh for example has a derivative of 1-tanh^2. This is also confusing as f(...) can be mistaken for applying function f to (...) where in this case it means the result of multiplying function f with 1-f. dP/df = (P)*(1-P) would be clearer.
Jfmiller28 (talk) 23:09, 2 January 2008 (UTC)
- is not even the special case of the logistic function mentioned in the text. How the reader could know what function the formula applies to. This part of the text is very confusing.130.234.198.85 (talk) 14:36, 7 January 2008 (UTC)
Are some of the sections talking about the logistic?
[edit]Please see my questions in comments. New Image Uploader 929 (talk) 00:50, 30 May 2008 (UTC)
- My text by Mitchell, which I listed on the article page (the only reference, BTW), equates the two:
- "σ(y) = 1/(1+e-y)
- "σ is often called the signmod function or, alternately, the logistic function. Note that its output ranges between 0 and 1 .... Because it it maps a very large input domain to a small range of outputs, it is often referred to as the squashing function of the unit [cf Figure 4.6 The sigmoid threshold unit; in this drawing, σ(y) = 1/(1+e-net), where net = Σ0i(wi*xi) and wi is the ith weight for the ith input xi and x0 is a constant -- x0 is important(!)]. The sigmoid function has the useful property that its derivative is easily expressed in terms of its output..." (Mitchell 1997:96-97)
- My guess is writers who distinguish between the two are (needlessly) splitting the hare (hair) and using two different names for the same function depending on where it is used. "Logistic" would seem to come from "logic" i.e. having 1 and 0 outcomes only; "Sigmoid" because of its shape as in "sigmoidoscopy". Anyway, as this is wikipedia and we need sources to back up our claims, mine says they are the same thing. Bill Wvbailey (talk) 15:07, 30 May 2008 (UTC)
External link to Logistic Function implementation in Excel should be maintained
[edit]An external link that has been in this page for a good while [1] points to a very useful implementation of the model in Excel. I've found the author moved the site so the link redirects to [2] I tried to update and now a user MrOllie has been deleting this edit, pointing to WP:EL (policy on external links) I appreciate his point, as in many topic of opinion, blogs are not an authoritative source. As this article is about math, I can't see the difference between the resource [3] and [4]. Both describe in further detail the logistic curve and show someone interested how to implement it. I find the Excel version very useful and relevant, and from comments in [5], some other people find it useful as well. Would like to see other members opinions 218.82.217.162 (talk) 17:46, 31 October 2008 (UTC)
A sigmoid curve is produced by a mathematical function having an "S" shape —Preceding unsigned comment added by 220.225.131.157 (talk) 04:16, 12 April 2010 (UTC)
Sigmoid and sigmoidal
[edit]the following is called a "sigmoidal function" in another article:
σ
Is ok to create a sigmoidal function redirect to this article? Or are they different things? walk victor falk talk 02:16, 15 February 2011 (UTC)
- Sigmoidal and sigmoid seem to me as they are the same thing; maybe there is some very slight difference but it's not pointed out by the article. --Kri (talk) 00:38, 16 February 2011 (UTC)
- Actually, there are two things going on. If one is referring to "THE sigmoid function", that is just another name for the logistic function. Secondarily, there is a class of functions of sigmoidal shape with y-axis ranges of +/- 1 and slope of 1 at the origin that are very useful for inverse transforms. An example of this is the Fisher transform that transforms correlation coefficients from their -1 to 1 range to -infinity to +infinity of a more normal distributed shape to allow for confidence intervals to be calculated, that would be arctanh(r) and then its inverse tanh(r+/-(n sd)), where n is the number of sd needed to make the confidence interval desired. Next, 2/[1+exp(-x)]-1 would be sigmoidal with y-axis range of -1 to +1, but slope 2 at the origin. To make it to have slope one, one writes tanh[x], which is none other than that the logistic function rescaled to go from -1 to +1 on the y axis with a slope of 1 at the origin, which then makes it into none other than the inverse of the Fisher Transform. Another example, t=s/(1-s) would transform a beta distribution (on 0 to 1) to be a beta prime distribution of t (on 0 to +infinity), its inverse transform would be s=t/(1+t) that transforms a beta prime to a beta. Now, to extend the range (-1 to 1 on the y-axis) one writes s=t/(1+|t|) and t=s/(1-|s|), where s=t/(1+|t|) is a standard slope sigmoidal function of slowest asymptote in the figure. Summarizing, the logistic function in standard slope +/-1 y-axis range format is tanh, it and the other similar sigmoidal shaped functions are inverse transform candidates for many uses. Conversely, one could rescale slope 1, y-axis +/-1 sigmoidal functions to be probability functions like the logistic function, but with differing asymptotic convergence rates. I have found the figure of +/-1 range standard slope sigmoidal functions to be much more useful than the more vanilla sigmoid function, A.K.A logistic function, and when I look up sigmoidal functions, I would rather see that. Finally, why a slope of 1? As it turns out, near zero, that makes the limiting values of the transform and its inverse on the same scale. Nor is this any sort of limitation, one can easily change the asymptotic scale, if so desired. Finally I would distinguish between "the" sigmoid function and other sigmoidal function to have a more useful article. CarlWesolowski (talk) 03:29, 23 November 2024 (UTC)
Sign of first derivative
[edit]Currently the section "Definition" says
- A sigmoid function is a bounded differentiable real function that is defined for all real input values and has a positive derivative at each point.
but then in the very next sentence the section "Properties" says
- In general, a sigmoid function is real-valued and differentiable, having either a non-negative or non-positive first derivative which is bell shaped.
So the article is inconsistent as to whether it must be upward sloping or whether it can alternatively be downward sloping, and (if downward sloping is precluded) as to whether it must have a positive or just a non-negative derivative. Duoduoduo (talk) 14:15, 2 November 2013 (UTC)
asymmetric sigmoid function
[edit]This page is missing a separation of symmetrical and asummetrical sigmoid functions , e.g. the Gompertz function is an asymmetric sigmoid http://en.wikipedia.org/wiki/Gompertz_function but there are many others.
Olbran (talk) 10:50, 17 March 2015 (UTC)
Definition and properties not consistent
[edit]The Properties section should be consistent with the Definition section. The Definition says that the derivative must be positive. The Properties section says that it must be either non-positive or non-negative. It's unnecessary anyway to restate properties that are explicit in the definition (other properties that can be derived from the definition should be mentioned), but it's unhelpful at least to be inconsistent. — Preceding unsigned comment added by 209.93.31.116 (talk) 17:20, 29 April 2017 (UTC)
Generalized symmetrical sigmoid function
[edit]I've found this page quite useful in my work in the statistics of vaccines research, especially the illustration of the six normalized functions, but agree it needs major clean up. Separate pages would be preferable for symmetrical sigmoid functions, of which there are probably only a limited number of known algebraic examples, and asymmetrical sigmoid functions, of which there must be quite a few more. Separating out those which have non-zero first derivatives on the whole domain – – from those which do not would also be advantageous, since any function which has a zero gradient at a couple of points can be converted into a sigmoid function by appropriately segmenting the domain, e.g. the Smoothstep, so these would be better grouped elsewhere. Also, functions that simply shift or rescale other functions could be consolidated – e.g. the hyperbolic tangent and logistic. Surely there must be pages somewhere on how to shift and rescale functions for those who wish to know how to do this.
Some recent work introduced a generalized symmetrical sigmoid function
where is a shape parameter governing how fast the curve approaches the asymptotes for a given slope at the inflection point. When the function is the 'absolute sigmoid function' shown in the illustration, and when the function is the 'square root sigmoid function'; when the function approximates the arctangent function, when it approximates the logistic function, and when it approximates the error function.[1]. This should be considered for inclusion.
Finally, I promise to come back and clean up this page as soon as I've finished my research on the statistics of vaccines research :). Adunning2 (talk) 02:22, 19 July 2018 (UTC)
References
- ^ Dunning, AJ; et al. (28 Dec 2015). "Some extensions in continuous methods for immunological correlates of protection". BMC Medical Research Methodology. 15 (107). doi:10.1186/s12874-015-0096-9. PMID 26707389.
{{cite journal}}
: Explicit use of et al. in:|first1=
(help)CS1 maint: unflagged free DOI (link)
Website in the image
[edit]Are private websites in images allowed, like the one in this page? Is it not considered a kind of subtle advertisement? — Preceding unsigned comment added by Raffamaiden (talk • contribs) 15:47, 3 April 2019 (UTC)
"A sigmoid function" vs. "The sigmoid function"
[edit]As it stands now, the article apparently flips between talking about sigmoid functions as members of a class of functions with "S"-shaped graph in general and talking about the logistic function as a specific example. Frankly, I would just remove the section Sigmoid function#Approximate inverse as it talks about the logistic function specifically. (And is unsourced anyway.) – Tea2min (talk) 05:48, 16 June 2020 (UTC)
- Done, also rolling back the organizational change. Hopefully I didn't clobber anything else in the middle, apologies if so. –Deacon Vorbis (carbon • videos) 14:13, 16 June 2020 (UTC)
- The article still seems to vacillate between identifying a sigmoid function as a general class of 'S'-shaped functions, and considering it to be specifically the logistic function. Not sure who can clear this up or on what authority. Seems like one avenue would be to research the history of the term "sigmoid function", and if there has been some change over time about how that term is applied, to expound briefly on the evolution of the term. Any 'history of math' scholars out there who may be able to shed some light? Y2PK (talk) 13:24, 23 September 2024 (UTC)
- The vacillation was reintroduced by a single edit to the lead a few months ago, which had made the article inconsistent, and which I have just undone. The references to this article themselves provide evidence that "sigmoid function" is not universally a synonym for "logistic function", with usage varying from one field to another. Some additional illustrative (though not necessarily authoritative) material:
- García, Oscar. (2005). Unifying sigmoid univariate growth equations. Forest Biometry, Modelling and Information Sciences (FBMIS). 1. 63-68.
- Godeau, U., Bouget, C., Piffady, J., Pozzi, T., & Gosselin, F. (2020). Lack of definition of mathematical terms in ecology: The case of the sigmoid class of functions in macro-ecology. Ecology and Evolution, 10(24), 14209–14220.
- stackoverflowuser2010 (2021). Is Wikipedia's page on the sigmoid function incorrect? Cross Validated (stats.stackexchange.com), question 544711.
- Wikipedia articles such as Error function and Gompertz function, which talk of their subjects as being sigmoid functions.
- Duplode (talk) 04:10, 8 December 2024 (UTC)
- The vacillation was reintroduced by a single edit to the lead a few months ago, which had made the article inconsistent, and which I have just undone. The references to this article themselves provide evidence that "sigmoid function" is not universally a synonym for "logistic function", with usage varying from one field to another. Some additional illustrative (though not necessarily authoritative) material:
- The article still seems to vacillate between identifying a sigmoid function as a general class of 'S'-shaped functions, and considering it to be specifically the logistic function. Not sure who can clear this up or on what authority. Seems like one avenue would be to research the history of the term "sigmoid function", and if there has been some change over time about how that term is applied, to expound briefly on the evolution of the term. Any 'history of math' scholars out there who may be able to shed some light? Y2PK (talk) 13:24, 23 September 2024 (UTC)
Problem in example
[edit]I stumbled over the second of the two algebraic functions in the example section: x^n/(x^n + (1-x)^n). as far as I see this is not a sigmoid function. It violates the definition of a sigmoid given in the section Definition. Notably, the derivative is not non-negative and it has multiple inflection points.
So this is really confusing and I'd suggest to remove the example and would do so if nobody objects.
A Commons file used on this page or its Wikidata item has been nominated for deletion
[edit]The following Wikimedia Commons file used on this page or its Wikidata item has been nominated for deletion:
Participate in the deletion discussion at the nomination page. —Community Tech bot (talk) 20:25, 11 September 2021 (UTC)
Is the "bounded" restriction generally accepted?
[edit]The article claims "A sigmoid function is a bounded, differentiable, real function [...]" whereas Logit#Comparison_with_probit states "The logit and probit are both sigmoid functions [...]". I don't think both of these statements can be true because logit and probit are unbounded. I noticed the reference cited for the sentence claiming the "bounded" restriction is from Lecture Notes in Computer Science in the article "From Natural to Artificial Neural Computation" so I suspect this definition is likely to be domain-specific. Should the "bounded" restriction be removed or qualified here?