Is there an online tool that can predict accuracy given only the dataset?


Is there an online tool that can predict accuracy given only the dataset as input (i.e. without the compiled model)?

That would help to understand how data augmentation/distribution standardization, etc., is likely to change the accuracy.


Posted 2020-04-13T12:46:16.647

Reputation: 31

Hi and welcome to AI SE! Maybe you should explain which models are you interested in and what kind of data do you have. – nbro – 2020-04-13T13:24:21.587



If I get correctly what you're asking, the answer is no, there is no way of knowing in advance how good a model would perform in a dataset without training a model on it. That's the whole point of data science, you try, you analyse the results, you try again using the knowledge you got from your previous attempts. It would be nice to hack the whole field and know in advance what to do to get the perfect model but it is rather unrealistic.

Anyway, there are some standard steps that usually help understanding if you're going on the right direction. For example creating a random benchmark to see how much your model is better than a random one. To create such benchmark you definitely don't need a specific tool, all main programming languages provide built in function to generate random numbers, and that's basically all that you need. For example in python you could do something like:

import numpy as np

# create 30 true labels 
y_true = [1,2,3]*10

# generate 30 random labels
y_random = np.random.randint(0,3,size=30)

# calculate accuracy for random model
acc_random = sum(y_true==y_random)/len(y_true) 

Another thing that is worth to mention is that in the academy people are pushing more and more to use the same datasets in order to have comparable results. for example if you're trying to train an architecture that should be used for images classification, some golden dataset you must test your models on are the MNIST ones. It is well known that on these datasets a good model should achieve more than 99% accuracy, therefore this is an establish lower bound for these datasets.

Edoardo Guerriero

Posted 2020-04-13T12:46:16.647

Reputation: 1 098


There is no tool to achieve what you desire. A "rough" way to get a guess is to "compare" your data set to other known data sets for which there are benchmark accuracy models. Compare is a vague definition at best but things you might compare are

  1. number of your classes versus the number of classes in the reference data set
  2. the number of train, validation and test samples
  3. the relative similarity of the data - that is are they images? If so image comapare size, are the images cropped to the region of interest, etc
  4. similarity of class features -that is are your classes very similar to each other. For example classifying dogs by breed can be difficult because some dog breeds look almost identical versus a situation like classifying different types of animals which is easier. Estimate your similarity of classes with respect to similarity of classes in the reference data set.

So you might be able to generate a "rough" estimate of what you might expect your model to achieve particularly if you use transfer learning of benchmark models.

Gerry P

Posted 2020-04-13T12:46:16.647

Reputation: 500