How about using numpy random choice

```
import numpy as np
from sklearn.datasets import load_iris
def ttv_split(X, y = None, train_size = .6, test_size = .2, validation_size = .2, random_state = 42):
"""
Basic approach using np random choice
"""
np.random.seed(random_state)
X = pd.DataFrame(X, columns = ["col_" + str(i) for i in range(X.shape[1])])
size = sum((train_size,test_size,validation_size))
n_samples = X.shape[0]
if size != 1:
return f"Size of the dataset must sum up to 100% instead: {size} correct and try again"
else:
split_series = np.random.choice(a = ["train","test","validation"], p = [train_size, test_size, validation_size], size = n_samples)
split_series = pd.Series(split_series)
X_train, X_test, X_validation = X.iloc[split_series[split_series == "train"].index,:], X.iloc[split_series[split_series == "test"].index,:], X.iloc[split_series[split_series == "validation"].index,:]
if not y is None:
y = pd.DataFrame(y,columns=["target"])
y_train, y_test, y_validation = y.iloc[split_series[split_series == "train"].index,:], y.iloc[split_series[split_series == "test"].index,:], y.iloc[split_series[split_series == "validation"].index,:]
return X_train,X_test,X_validation,y_train,y_test,y_validation
else:
return X_train,X_test,X_validation
X,y = load_iris(return_Xy = True)
X_train,X_test,X_validation,y_train,y_test,y_validation = ttv_split(X, y)
```

3Yes, this works of course but I hoped for something more elegant ;) Never mind, I accept this answer. – Hendrik – 2016-11-17T08:10:53.463

1

I wanted to add that if you want to use the validation set to search for the best hyper-parameters you can do the following after the split: https://gist.github.com/albertotb/1bad123363b186267e3aeaa26610b54b

– skd – 2018-06-06T16:34:04.53716So what is the final train, test, validation proportion in this example? Because on the second

`train_test_split`

, you are doing this over the previous 80/20 split. So your val is 20% of 80%. The split proportions aren't very straightforward in this way. – Monica Heddneck – 2018-06-14T19:22:37.8471I agree with @Monica Heddneck that the 64% train, 16% validation and 20% test splt could be clearer. It's an annoying inference you have to make with this solution. – Perry – 2019-06-25T08:00:45.290

1if test_size is an integer number this function will take test_size number of elements for test, so you can pre-compute the number of elements in each subsets given your proportion and use these values to do a double split – GJCode – 2019-11-10T10:39:42.520

I found this answer useful, so I thought I would add some explanatory text regarding the numbers. The first split creates 80% training+validation and 20% test. The second split starts with the 80% training+validation split and assigns 25% of this 80% to the validation split - this size comes from 0.25 X 0.80 = 0.20 (20%). So the validation split is 20%. So, now we have validation and testing at 20% each. The training split size is calculated as 75% of the 80% = 0.75 X 0.80 = 0.60 (60%). So, this gives a training split size of 60%. Overall, this gives 60%-20%-20% for train-validation-test. – edesz – 2020-10-20T03:53:52.190

I don't have any labels....how do I do the split? – Charlie Parker – 2021-02-13T20:28:16.050