How to set class weights for imbalanced classes in Keras?



I know that there is a possibility in Keras with the class_weights parameter dictionary at fitting, but I couldn't find any example. Would somebody so kind to provide one?

By the way, in this case the appropriate praxis is simply to weight up the minority class proportionally to its underrepresentation?


Posted 2016-08-17T09:35:45.110

Reputation: 6 637



If you are talking about the regular case, where your network produces only one output, then your assumption is correct. In order to force your algorithm to treat every instance of class 1 as 50 instances of class 0 you have to:

  1. Define a dictionary with your labels and their associated weights

    class_weight = {0: 1.,
                    1: 50.,
                    2: 2.}
  2. Feed the dictionary as a parameter:, Y_train, nb_epoch=5, batch_size=32, class_weight=class_weight)

EDIT: "treat every instance of class 1 as 50 instances of class 0" means that in your loss function you assign higher value to these instances. Hence, the loss becomes a weighted average, where the weight of each sample is specified by class_weight and its corresponding class.

From Keras docs:

class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only).


Posted 2016-08-17T09:35:45.110

Reputation: 1 856


Also have a look at if you're working with 3D data.

– herve – 2017-04-26T09:12:48.837

For me it gives a error dic don't has shape attribute. – Flávio Filho – 2017-05-23T00:11:08.297

I believe Keras could be changing the way this works, this is for the version of August 2016. I will verify for you in a week – layser – 2017-05-25T14:12:47.580

2Does this work for one-hot-encoded labels? – megashigger – 2018-01-08T19:49:49.183

9@layser Does this work only for 'category_crossentropy' loss? How do you give class_weight to keras for 'sigmoid' and 'binary_crossentropy' loss? – Naman – 2018-04-15T19:26:01.297

2@layser Can you explain to treat every instance of class 1 as 50 instances of class 0 ? Is it that the in training set, row corresponding to class 1 is duplicated 50 times in order to make it balanced or some other process follows? – Divyanshu Shekhar – 2018-06-12T05:12:22.277

How can we know which class is class0? Same for the class1. – Philippe Remy – 2020-11-10T08:26:57.957


You could simply implement the class_weight from sklearn:

  1. Let's import the module first

    from sklearn.utils import class_weight
  2. In order to calculate the class weight do the following

    class_weights = class_weight.compute_class_weight('balanced',
  3. Thirdly and lastly add it to the model fitting, y_train, class_weight=class_weights)

Attention: I edited this post and changed the variable name from class_weight to class_weights in order to not to overwrite the imported module. Adjust accordingly when copying code from the comments.


Posted 2016-08-17T09:35:45.110

Reputation: 2 041

33For me, class_weight.compute_class_weight produces an array, I need to change it to a dict in order to work with Keras. More specifically, after step 2, use class_weight_dict = dict(enumerate(class_weight)) – C.Lee – 2017-10-13T04:33:48.690

8This doesn't work for me. For a three class problem in keras y_train is (300096, 3) numpy array. So the class_weight= line gives me TypeError: unhashable type: 'numpy.ndarray' – Lembik – 2017-12-14T10:25:09.850

5@Lembik I had a similar problem, where each row of y is a one-hot encoded vector of the class index. I fixed it by converting the one-hot representation to an int like this: y_ints = [y.argmax() for y in y_train]. – tkocmathla – 2018-04-12T14:19:36.677

5What if I'm doing multiclass labeling so that my y_true vectors have multiple 1s in them: [1 0 0 0 1 0 0] for instance, where some x has labels 0 and 4. Even then, the total # of each of my labels is not balanced. How would I use class weights with that? – axolotl – 2018-11-25T18:28:31.203


I use this kind of rule for class_weight :

import numpy as np
import math

# labels_dict : {ind_label: count_label}
# mu : parameter to tune 

def create_class_weight(labels_dict,mu=0.15):
    total = np.sum(list(labels_dict.values()))
    keys = labels_dict.keys()
    class_weight = dict()
    for key in keys:
        score = math.log(mu*total/float(labels_dict[key]))
        class_weight[key] = score if score > 1.0 else 1.0
    return class_weight

# random labels_dict
labels_dict = {0: 2813, 1: 78, 2: 2814, 3: 78, 4: 7914, 5: 248, 6: 7914, 7: 248}


math.log smooths the weights for very imbalanced classes ! This returns :

{0: 1.0,
 1: 3.749820767859636,
 2: 1.0,
 3: 3.749820767859636,
 4: 1.0,
 5: 2.5931008483842453,
 6: 1.0,
 7: 2.5931008483842453}


Posted 2016-08-17T09:35:45.110

Reputation: 421

4Why use log instead of just dividing the count of samples for a class by the total number of samples? I am assume there is something I don't understand goes into the param class_weight on model.fit_generator(...) – startoftext – 2017-05-04T03:11:19.593

1@startoftext That's how I did it, but I think you have it inverted. I used n_total_samples / n_class_samples for each class. – colllin – 2017-10-19T17:34:11.100

3In your example class 0 (has 2813 examples) and class 6 (has 7914 examples) have weight exactly 1.0. Why is that? The class 6 is few times bigger! You would want class 0 be upscaled and class 6 downscaled to bring them to the same level. – Vladislavs Dovgalecs – 2018-01-16T20:55:01.223

2@VladislavsDovgalecs cuz he is doing it to smooth the weights. When imbalance in classes is measured by orders of magnitude, it's not very helpful to assign weights like 100. It's gonna harm bigger class: FPs on that scarce class with high weight – apatsekin – 2020-03-03T18:14:23.760

1If i have a imbalance as {0: 1300000, 1: 40, 2: 2000} . Is there any intuition as to how I can set the mu parameter here ? to handle the smoothing – lamo_738 – 2020-09-14T19:17:01.740

@lamo this example explains how they get a value of 0.14 which is close to the "magical" 0.15 shown in this answer.

– Michael Szczepaniak – 2020-11-03T23:08:06.000


class_weight is fine but as @Aalok said this won't work if you are one-hot encoding multilabeled classes. In this case, use sample_weight:

sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile().

sample_weights is used to provide a weight for each training sample. That means that you should pass a 1D array with the same number of elements as your training samples (indicating the weight for each of those samples).

class_weights is used to provide a weight or bias for each output class. This means you should pass a weight for each class that you are trying to classify.

sample_weight must be given a numpy array, since its shape will be evaluated.

See also this answer.

Charly Empereur-mot

Posted 2016-08-17T09:35:45.110

Reputation: 191


Adding to the solution at If you need more than class weighting where you want different costs for false positives and false negatives. With the new keras version now you can just override the respective loss function as given below. Note that weights is a square matrix.

from tensorflow.python import keras
from itertools import product
import numpy as np
from tensorflow.python.keras.utils import losses_utils

class WeightedCategoricalCrossentropy(keras.losses.CategoricalCrossentropy):

    def __init__(
            from_logits, label_smoothing, reduction, name=f"weighted_{name}"
        self.weights = weights

    def call(self, y_true, y_pred):
        weights = self.weights
        nb_cl = len(weights)
        final_mask = keras.backend.zeros_like(y_pred[:, 0])
        y_pred_max = keras.backend.max(y_pred, axis=1)
        y_pred_max = keras.backend.reshape(
            y_pred_max, (keras.backend.shape(y_pred)[0], 1))
        y_pred_max_mat = keras.backend.cast(
            keras.backend.equal(y_pred, y_pred_max), keras.backend.floatx())
        for c_p, c_t in product(range(nb_cl), range(nb_cl)):
            final_mask += (
                weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
        return super().call(y_true, y_pred) * final_mask

Praveen Kulkarni

Posted 2016-08-17T09:35:45.110

Reputation: 171


Here's a one-liner using scikit-learn:

from sklearn.utils import class_weight
class_weights = dict(zip(np.unique(y_train), class_weight.compute_class_weight('balanced', np.unique(y_train), 


Posted 2016-08-17T09:35:45.110

Reputation: 159


I found the following example of coding up class weights in the loss function using the minist dataset. See link here.

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask


Posted 2016-08-17T09:35:45.110

Reputation: 206


from collections import Counter
itemCt = Counter(trainGen.classes)
maxCt = float(max(itemCt.values()))
cw = {clsID : maxCt/numImg for clsID, numImg in itemCt.items()}

This works with a generator or standard. Your largest class will have a weight of 1 while the others will have values greater than 1 depending on how infrequent they are relative to the largest class.

Class weights accepts a dictionary type input.


Posted 2016-08-17T09:35:45.110

Reputation: 21