On the properties of Hyperbolic Tangent Kernel


How do Hyperbolic Tangent Kernels work? That is what is the intuition behind them? Can you provide proofs and examples for illustration?

Hyperbolic Tangent Kernels are defined as: $$ K(x, x^\prime) = tanh\bigg(\alpha (x\cdot x^\prime) + c\bigg)$$

For example, for Gaussian RBF kernel, the intuition is that the support vectors affect the decision surface based on the locality of influence. What is the analog for Hyperbolic Tangent (Sigmoid Kernels)?

Some references on the Hyperbolic Tangent Kernels are:

  1. Hsuan-Tien Lin and Chih-Jen Lin
  2. Sabri Boughorbel, Jean-Philippe Tarel, Nozha Boujemaa


Posted 2016-03-02T09:27:41.087

Reputation: 461

Question was closed 2016-03-10T11:04:20.293


This was cross-posted to http://stats.stackexchange.com/questions/199620/on-the-properties-of-hyperbolic-tangent-kernel . Either site may be fine for this question. Since the other post got an answer that @Ragnar liked more, I propose this be closed as a duplicate. (Generally, we don't cross post.)

– Sean Owen – 2016-03-09T20:59:38.510

I'm voting to close this question because it's a cross post and seems the OP found it better suited to stats SE – Sean Owen – 2016-03-10T11:04:20.293



Sigmoid kernels owe their popularity to neural networks, which traditionally used the sigmoid activation function. Sigmoid kernels de-emphasize extreme correlation. In a way they behave a bit like correlation coefficients, which also has a limited range, emphasizing similarity in orientation. $c$ shifts the operating point on the sigmoid, affecting the relative emphasis of the angle between the inputs. Perhaps this visualization (for $c=0$) might help mentally visualize this:

A New Mercer Sigmoid Kernel for Clinical Data Classification

Your first reference states that sigmoid kernels behave like RBFs for certain parameters. This makes them suited to nonlinear classification. You probably know that the sigmoid kernel is only conditionally PSD, and thus sometimes does not correspond to the kernel function of any implicit feature map per Mercer's theorem.

Someone this interested in kernel methods should have a copy of Learning with Kernels!


Posted 2016-03-02T09:27:41.087

Reputation: 9 953

1@Ragnar I also don't understand why you antagonize answerers this way. You can vote down; you can clarify your question; you can ask for more detail; you can move on. You're not asking why the tanh function is called "hyperbolic" right? it's a rescaling of the logistic sigmoid function, hence the explanation above. Aren't you just asking about sigmoid kernels? – Sean Owen – 2016-03-08T22:10:53.907