- 欠損値処理・変数選択・特徴量エンジニアリングをした訓練データを全てつかってモデリングした場合
- 欠損値処理・変数選択・特徴量エンジニアリングもしていないローデータを全て使ってモデリングした場合
※ brew install しているのは yum や apt に置き換える必要はあります。
(MLJAR) Pythonで3つのAutoML環境を用意してみた
(AutoGluon) Pythonで3つのAutoML環境を用意してみた
(auto-sklearn) Pythonで3つのAutoML環境を用意してみた
import pandas as pd
import numpy as np
# タイタニックデータセットの学習用データと評価用データの読み込み
df_train = pd.read_csv("/Users/hinomaruc/Desktop/blog/dataset/titanic/titanic_train.csv")
df_eval = pd.read_csv("/Users/hinomaruc/Desktop/blog/dataset/titanic/titanic_eval.csv")
# 説明変数
, 'Fare'
, 'SameTicketCnt'
, 'Pclass_str_1'
, 'Pclass_str_3'
, 'Sex_female'
, 'Embarked_Q'
, 'Embarked_S'
X_train = df_train[FEATURE_COLS] # 説明変数 (train)
Y_train = df_train["Survived"] # 目的変数 (train)
# https://supervised.mljar.com/api/
# mlboxのモデル作成
from supervised.automl import AutoML
automl = AutoML(mode="Compete", random_state=100)
# fitする
AutoML directory: AutoML_3 The task is binary_classification with evaluation metric logloss AutoML will use algorithms: ['Decision Tree', 'Linear', 'Random Forest', 'Extra Trees', 'LightGBM', 'Xgboost', 'CatBoost', 'Neural Network', 'Nearest Neighbors'] AutoML will stack models AutoML will ensemble available models AutoML steps: ['adjust_validation', 'simple_algorithms', 'default_algorithms', 'not_so_random', 'golden_features', 'kmeans_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'boost_on_errors', 'ensemble', 'stack', 'ensemble_stacked'] * Step adjust_validation will try to check up to 1 model 1_DecisionTree logloss 0.643133 trained in 1.21 seconds Adjust validation. Remove: 1_DecisionTree Validation strategy: 10-fold CV Shuffle,Stratify * Step simple_algorithms will try to check up to 4 models 1_DecisionTree logloss 0.535371 trained in 2.87 seconds 2_DecisionTree logloss 0.46238 trained in 2.78 seconds 3_DecisionTree logloss 0.46392 trained in 2.76 seconds 4_Linear logloss 0.452554 trained in 7.67 seconds * Step default_algorithms will try to check up to 7 models 5_Default_LightGBM logloss 0.402072 trained in 5.66 seconds 6_Default_Xgboost logloss 0.400937 trained in 5.37 seconds 7_Default_CatBoost logloss 0.384351 trained in 5.38 seconds 8_Default_NeuralNetwork logloss 0.490864 trained in 7.85 seconds 9_Default_RandomForest logloss 0.417678 trained in 11.12 seconds 10_Default_ExtraTrees logloss 0.420511 trained in 11.95 seconds 11_Default_NearestNeighbors logloss 0.909295 trained in 4.29 seconds * Step not_so_random will try to check up to 61 models 21_LightGBM logloss 0.398133 trained in 9.02 seconds 12_Xgboost logloss 0.391173 trained in 9.16 seconds 30_CatBoost logloss 0.387318 trained in 6.19 seconds 39_RandomForest logloss 0.407113 trained in 13.39 seconds 48_ExtraTrees logloss 0.411102 trained in 11.25 seconds 57_NeuralNetwork logloss 0.444187 trained in 9.94 seconds 66_NearestNeighbors logloss 0.834326 trained in 5.44 seconds 22_LightGBM logloss 0.386149 trained in 7.45 seconds 13_Xgboost logloss 0.415666 trained in 8.3 seconds 31_CatBoost logloss 0.385593 trained in 9.93 seconds 40_RandomForest logloss 0.406411 trained in 17.76 seconds 49_ExtraTrees logloss 0.414009 trained in 18.35 seconds 58_NeuralNetwork logloss 0.439771 trained in 15.31 seconds 67_NearestNeighbors logloss 1.112462 trained in 7.37 seconds 23_LightGBM logloss 0.406987 trained in 9.88 seconds 14_Xgboost logloss 0.421778 trained in 10.6 seconds 32_CatBoost logloss 0.389089 trained in 15.27 seconds 41_RandomForest logloss 0.411355 trained in 16.84 seconds 50_ExtraTrees logloss 0.410128 trained in 14.09 seconds 59_NeuralNetwork logloss 0.473541 trained in 12.82 seconds 68_NearestNeighbors logloss 1.299103 trained in 8.58 seconds 24_LightGBM logloss 0.402071 trained in 10.49 seconds 15_Xgboost logloss 0.383564 trained in 11.58 seconds 33_CatBoost logloss 0.388264 trained in 11.46 seconds 42_RandomForest logloss 0.411391 trained in 16.71 seconds 51_ExtraTrees logloss 0.430014 trained in 19.28 seconds 60_NeuralNetwork logloss 0.466434 trained in 14.24 seconds 69_NearestNeighbors logloss 1.299103 trained in 9.21 seconds 25_LightGBM logloss 0.400083 trained in 11.1 seconds 16_Xgboost logloss 0.449111 trained in 11.83 seconds 34_CatBoost logloss 0.383657 trained in 13.94 seconds 43_RandomForest logloss 0.419977 trained in 19.14 seconds 52_ExtraTrees logloss 0.433074 trained in 17.28 seconds 61_NeuralNetwork logloss 0.509227 trained in 15.43 seconds 70_NearestNeighbors logloss 1.112462 trained in 10.47 seconds 26_LightGBM logloss 0.386131 trained in 12.3 seconds 17_Xgboost logloss 0.4757 trained in 13.62 seconds 35_CatBoost logloss 0.39084 trained in 14.67 seconds 44_RandomForest logloss 0.407182 trained in 19.22 seconds 53_ExtraTrees logloss 0.416177 trained in 19.04 seconds 62_NeuralNetwork logloss 0.685383 trained in 14.21 seconds 71_NearestNeighbors logloss 1.559218 trained in 11.71 seconds 27_LightGBM logloss 0.402373 trained in 16.53 seconds 18_Xgboost logloss 0.541056 trained in 16.04 seconds 36_CatBoost logloss 0.388274 trained in 18.56 seconds 45_RandomForest logloss 0.411463 trained in 22.94 seconds 54_ExtraTrees logloss 0.414643 trained in 20.77 seconds 63_NeuralNetwork logloss 0.457195 trained in 18.83 seconds 72_NearestNeighbors logloss 1.299103 trained in 13.34 seconds 28_LightGBM logloss 0.396239 trained in 15.29 seconds 19_Xgboost logloss 0.403582 trained in 18.18 seconds 37_CatBoost logloss 0.390126 trained in 16.62 seconds 46_RandomForest logloss 0.405416 trained in 21.1 seconds 55_ExtraTrees logloss 0.398471 trained in 20.1 seconds 64_NeuralNetwork logloss 0.496227 trained in 18.8 seconds 29_LightGBM logloss 0.39758 trained in 16.64 seconds 20_Xgboost logloss 0.473729 trained in 17.73 seconds 38_CatBoost logloss 0.386961 trained in 18.37 seconds 47_RandomForest logloss 0.406615 trained in 27.31 seconds 56_ExtraTrees logloss 0.414916 trained in 23.71 seconds 65_NeuralNetwork logloss 0.45266 trained in 20.0 seconds * Step golden_features will try to check up to 3 models None 10 Add Golden Feature: Pclass_str_3_diff_Sex_female Add Golden Feature: Sex_female_multiply_SameTicketCnt Add Golden Feature: SameTicketCnt_ratio_Sex_female Add Golden Feature: Sex_female_ratio_SameTicketCnt Add Golden Feature: Age_ratio_Sex_female Add Golden Feature: Sex_female_multiply_Age Add Golden Feature: Sex_female_ratio_Age Add Golden Feature: Sex_female_sum_SameTicketCnt Add Golden Feature: Embarked_Q_sum_Sex_female Add Golden Feature: Sex_female_diff_Embarked_S Created 10 Golden Features in 13.04 seconds. 15_Xgboost_GoldenFeatures logloss 0.38565 trained in 34.62 seconds 34_CatBoost_GoldenFeatures logloss 0.387767 trained in 22.28 seconds 7_Default_CatBoost_GoldenFeatures logloss 0.390365 trained in 18.69 seconds * Step kmeans_features will try to check up to 3 models 15_Xgboost_KMeansFeatures logloss 0.393264 trained in 23.62 seconds 34_CatBoost_KMeansFeatures logloss 0.391255 trained in 39.62 seconds 7_Default_CatBoost_KMeansFeatures logloss 0.392218 trained in 21.07 seconds * Step insert_random_feature will try to check up to 1 model 15_Xgboost_RandomFeature logloss 0.400385 trained in 20.27 seconds Drop features ['random_feature', 'Embarked_S', 'Embarked_Q'] * Step features_selection will try to check up to 6 models 15_Xgboost_SelectedFeatures logloss 0.388467 trained in 20.93 seconds 34_CatBoost_SelectedFeatures logloss 0.388058 trained in 20.91 seconds 26_LightGBM_SelectedFeatures logloss 0.386319 trained in 18.57 seconds 55_ExtraTrees_SelectedFeatures logloss 0.411567 trained in 24.17 seconds 46_RandomForest_SelectedFeatures logloss 0.40437 trained in 25.48 seconds 58_NeuralNetwork_SelectedFeatures logloss 0.428871 trained in 23.06 seconds * Step hill_climbing_1 will try to check up to 31 models 73_Xgboost logloss 0.381747 trained in 20.42 seconds 74_Xgboost logloss 0.385947 trained in 20.49 seconds 75_CatBoost logloss 0.387476 trained in 21.48 seconds 76_CatBoost logloss 0.384692 trained in 20.52 seconds 77_CatBoost logloss 0.385155 trained in 20.94 seconds 78_CatBoost logloss 0.382853 trained in 21.67 seconds 79_CatBoost logloss 0.386026 trained in 26.66 seconds 80_Xgboost_GoldenFeatures logloss 0.389585 trained in 22.6 seconds 81_Xgboost_GoldenFeatures logloss 0.384525 trained in 22.23 seconds 82_LightGBM logloss 0.386149 trained in 21.3 seconds 83_LightGBM logloss 0.380569 trained in 20.84 seconds 84_LightGBM logloss 0.386131 trained in 21.03 seconds 85_LightGBM_SelectedFeatures logloss 0.388061 trained in 22.38 seconds 86_LightGBM_SelectedFeatures logloss 0.38196 trained in 21.28 seconds 87_Xgboost_SelectedFeatures logloss 0.39152 trained in 22.81 seconds 88_Xgboost_SelectedFeatures logloss 0.391144 trained in 22.98 seconds 89_ExtraTrees logloss 0.410087 trained in 27.97 seconds 90_RandomForest_SelectedFeatures logloss 0.398531 trained in 27.54 seconds 91_RandomForest logloss 0.402316 trained in 31.35 seconds 92_RandomForest logloss 0.406411 trained in 30.2 seconds 93_ExtraTrees logloss 0.413942 trained in 28.46 seconds 94_ExtraTrees logloss 0.409957 trained in 29.7 seconds 95_ExtraTrees logloss 0.411102 trained in 27.93 seconds 96_NeuralNetwork_SelectedFeatures logloss 0.434763 trained in 27.42 seconds 97_NeuralNetwork_SelectedFeatures logloss 0.43951 trained in 25.95 seconds 98_NeuralNetwork logloss 0.4356 trained in 26.6 seconds 99_DecisionTree logloss 0.467542 trained in 23.15 seconds 100_DecisionTree logloss 0.646404 trained in 22.99 seconds 101_DecisionTree logloss 0.444153 trained in 23.72 seconds 102_DecisionTree logloss 0.444153 trained in 23.86 seconds 103_NearestNeighbors logloss 1.200042 trained in 23.7 seconds * Step hill_climbing_2 will try to check up to 12 models 104_LightGBM logloss 0.38445 trained in 24.67 seconds 105_Xgboost logloss 0.391009 trained in 27.53 seconds 106_LightGBM_SelectedFeatures logloss 0.389836 trained in 25.0 seconds 107_CatBoost logloss 0.388503 trained in 26.52 seconds 108_Xgboost logloss 0.388531 trained in 27.58 seconds 109_Xgboost_GoldenFeatures logloss 0.387964 trained in 27.87 seconds 110_LightGBM logloss 0.383974 trained in 31.19 seconds 111_RandomForest_SelectedFeatures logloss 0.396759 trained in 37.62 seconds 112_RandomForest logloss 0.397621 trained in 35.58 seconds 113_ExtraTrees logloss 0.421114 trained in 32.88 seconds 114_ExtraTrees logloss 0.402344 trained in 32.32 seconds 115_NeuralNetwork logloss 0.4562 trained in 40.35 seconds * Step boost_on_errors will try to check up to 1 model 83_LightGBM_BoostOnErrors logloss 0.390312 trained in 34.35 seconds * Step ensemble will try to check up to 1 model Ensemble logloss 0.370908 trained in 77.27 seconds * Step stack will try to check up to 60 models 83_LightGBM_Stacked logloss 0.366533 trained in 29.04 seconds 73_Xgboost_Stacked logloss 0.368775 trained in 35.0 seconds 78_CatBoost_Stacked logloss 0.363835 trained in 65.46 seconds 111_RandomForest_SelectedFeatures_Stacked logloss 0.378099 trained in 49.06 seconds 55_ExtraTrees_Stacked logloss 0.365257 trained in 38.91 seconds 58_NeuralNetwork_SelectedFeatures_Stacked logloss 0.415738 trained in 33.64 seconds 86_LightGBM_SelectedFeatures_Stacked logloss 0.366778 trained in 30.07 seconds 15_Xgboost_Stacked logloss 0.370325 trained in 36.88 seconds 34_CatBoost_Stacked logloss 0.364966 trained in 143.69 seconds 112_RandomForest_Stacked logloss 0.369609 trained in 49.01 seconds 114_ExtraTrees_Stacked logloss 0.365528 trained in 39.8 seconds 96_NeuralNetwork_SelectedFeatures_Stacked logloss 0.444507 trained in 792.28 seconds * Step ensemble_stacked will try to check up to 1 model Ensemble_Stacked logloss 0.355152 trained in 107.36 seconds AutoML fit time: 4292.47 seconds AutoML best model: Ensemble_Stacked
AutoML best model: Ensemble_Stackedという結果になりました。
df_eval["Survived"] = automl.predict(df_eval[FEATURE_COLS])
!/Users/hinomaruc/Desktop/blog/my-venv/bin/kaggle competitions submit -c titanic -f titanic_submission.csv -m "model #011. mljar パターン1"
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label="Survived", problem_type="binary",path="RESULT_AUTOGLUON").fit(X_train, time_limit = 600)
Beginning AutoGluon training ... Time limit = 600s AutoGluon will save models to "RESULT_AUTOGLUON/" AutoGluon Version: 0.4.2 Python Version: 3.8.13 Operating System: Darwin Train Data Rows: 891 Train Data Columns: 8 Label Column: Survived Preprocessing data ... Selected class <--> label mapping: class 1 = 1, class 0 = 0 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 11537.1 MB Train Data (Original) Memory Usage: 0.06 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Note: Converting 5 features to boolean dtype as they only contain 2 unique values. Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('float', []) : 7 | ['Age', 'Fare', 'Pclass_str_1', 'Pclass_str_3', 'Sex_female', ...] ('int', []) : 1 | ['SameTicketCnt'] Types of features in processed data (raw dtype, special dtypes): ('float', []) : 2 | ['Age', 'Fare'] ('int', []) : 1 | ['SameTicketCnt'] ('int', ['bool']) : 5 | ['Pclass_str_1', 'Pclass_str_3', 'Sex_female', 'Embarked_Q', 'Embarked_S'] 0.1s = Fit runtime 8 features in original data used to generate 8 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.12s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric parameter of Predictor() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 712, Val Rows: 179 Fitting 13 L1 models ... Fitting model: KNeighborsUnif ... Training model for up to 599.88s of the 599.88s of remaining time. 0.6592 = Validation score (accuracy) 0.02s = Training runtime 0.04s = Validation runtime Fitting model: KNeighborsDist ... Training model for up to 599.8s of the 599.8s of remaining time. 0.6704 = Validation score (accuracy) 0.01s = Training runtime 0.01s = Validation runtime Fitting model: LightGBMXT ... Training model for up to 599.77s of the 599.77s of remaining time. 0.8212 = Validation score (accuracy) 2.65s = Training runtime 0.01s = Validation runtime Fitting model: LightGBM ... Training model for up to 597.1s of the 597.1s of remaining time. 0.838 = Validation score (accuracy) 0.4s = Training runtime 0.01s = Validation runtime Fitting model: RandomForestGini ... Training model for up to 596.68s of the 596.68s of remaining time. 0.7989 = Validation score (accuracy) 1.13s = Training runtime 0.08s = Validation runtime Fitting model: RandomForestEntr ... Training model for up to 595.42s of the 595.42s of remaining time. 0.7989 = Validation score (accuracy) 0.8s = Training runtime 0.09s = Validation runtime Fitting model: CatBoost ... Training model for up to 594.48s of the 594.47s of remaining time. 0.8547 = Validation score (accuracy) 1.43s = Training runtime 0.0s = Validation runtime Fitting model: ExtraTreesGini ... Training model for up to 593.03s of the 593.03s of remaining time. 0.7877 = Validation score (accuracy) 0.78s = Training runtime 0.08s = Validation runtime Fitting model: ExtraTreesEntr ... Training model for up to 592.11s of the 592.11s of remaining time. 0.7821 = Validation score (accuracy) 0.78s = Training runtime 0.12s = Validation runtime Fitting model: NeuralNetFastAI ... Training model for up to 591.16s of the 591.15s of remaining time. 0.8324 = Validation score (accuracy) 5.45s = Training runtime 0.02s = Validation runtime Fitting model: XGBoost ... Training model for up to 585.66s of the 585.66s of remaining time. 0.8436 = Validation score (accuracy) 0.59s = Training runtime 0.01s = Validation runtime Fitting model: NeuralNetTorch ... Training model for up to 585.04s of the 585.04s of remaining time. 0.8045 = Validation score (accuracy) 3.17s = Training runtime 0.02s = Validation runtime Fitting model: LightGBMLarge ... Training model for up to 581.85s of the 581.84s of remaining time. 0.8324 = Validation score (accuracy) 0.67s = Training runtime 0.01s = Validation runtime Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 580.44s of remaining time. 0.8603 = Validation score (accuracy) 0.74s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 20.39s ... Best model: "WeightedEnsemble_L2" TabularPredictor saved. To load, use: predictor = TabularPredictor.load("RESULT_AUTOGLUON/")
df_eval["Survived"] = predictor.predict(df_eval)
!/Users/hinomaruc/Desktop/blog/my-venv/bin/kaggle competitions submit -c titanic -f titanic_submission.csv -m "model #011. autogluon パターン1"
import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, Y_train)
df_eval["Survived"] = cls.predict(df_eval[FEATURE_COLS])
!/Users/hinomaruc/Desktop/blog/my-venv/bin/kaggle competitions submit -c titanic -f titanic_submission.csv -m "model #011. autosklearn パターン1"
- 欠損値処理・変数選択・特徴量エンジニアリングもしていないローデータを全て使ってモデリングした場合
# https://supervised.mljar.com/api/
from supervised.automl import AutoML
automl = AutoML(mode="Compete", random_state=100)
AutoML directory: AutoML_4 The task is binary_classification with evaluation metric logloss AutoML will use algorithms: ['Decision Tree', 'Linear', 'Random Forest', 'Extra Trees', 'LightGBM', 'Xgboost', 'CatBoost', 'Neural Network', 'Nearest Neighbors'] AutoML will stack models AutoML will ensemble available models AutoML steps: ['adjust_validation', 'simple_algorithms', 'default_algorithms', 'not_so_random', 'mix_encoding', 'golden_features', 'kmeans_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'boost_on_errors', 'ensemble', 'stack', 'ensemble_stacked'] * Step adjust_validation will try to check up to 1 model 1_DecisionTree logloss 0.461027 trained in 2.64 seconds Adjust validation. Remove: 1_DecisionTree Validation strategy: 10-fold CV Shuffle,Stratify * Step simple_algorithms will try to check up to 4 models 1_DecisionTree logloss 0.588967 trained in 12.97 seconds 2_DecisionTree logloss 0.42534 trained in 11.45 seconds 3_DecisionTree logloss 0.457723 trained in 11.2 seconds 4_Linear logloss 0.526347 trained in 23.32 seconds * Step default_algorithms will try to check up to 7 models 5_Default_LightGBM logloss 0.405131 trained in 14.45 seconds 6_Default_Xgboost logloss 0.403659 trained in 14.77 seconds 7_Default_CatBoost logloss 0.395797 trained in 18.35 seconds 8_Default_NeuralNetwork logloss 0.76738 trained in 20.43 seconds 9_Default_RandomForest logloss 0.398352 trained in 24.21 seconds 10_Default_ExtraTrees logloss 0.398378 trained in 26.34 seconds 11_Default_NearestNeighbors logloss 1.098024 trained in 16.76 seconds * Step not_so_random will try to check up to 61 models 21_LightGBM logloss 0.399716 trained in 14.63 seconds 12_Xgboost logloss 0.406368 trained in 16.04 seconds 30_CatBoost logloss 0.40208 trained in 17.71 seconds 39_RandomForest logloss 0.404127 trained in 32.21 seconds 48_ExtraTrees logloss 0.403249 trained in 27.43 seconds 57_NeuralNetwork logloss 0.584528 trained in 21.8 seconds 66_NearestNeighbors logloss 0.854819 trained in 18.14 seconds 22_LightGBM logloss 0.410192 trained in 15.57 seconds 13_Xgboost logloss 0.408691 trained in 17.36 seconds 31_CatBoost logloss 0.395837 trained in 29.12 seconds 40_RandomForest logloss 0.398086 trained in 33.49 seconds 49_ExtraTrees logloss 0.403386 trained in 37.29 seconds 58_NeuralNetwork logloss 0.701188 trained in 34.28 seconds 67_NearestNeighbors logloss 0.847123 trained in 25.93 seconds 23_LightGBM logloss 0.425954 trained in 20.98 seconds 14_Xgboost logloss 0.413054 trained in 27.48 seconds 32_CatBoost logloss 0.398507 trained in 61.67 seconds 41_RandomForest logloss 0.400717 trained in 34.67 seconds 50_ExtraTrees logloss 0.390717 trained in 29.41 seconds 59_NeuralNetwork logloss 0.850408 trained in 24.97 seconds 68_NearestNeighbors logloss 1.652493 trained in 21.87 seconds 24_LightGBM logloss 0.402971 trained in 18.79 seconds 15_Xgboost logloss 0.39368 trained in 20.78 seconds 33_CatBoost logloss 0.402891 trained in 27.12 seconds 42_RandomForest logloss 0.398977 trained in 29.68 seconds 51_ExtraTrees logloss 0.406206 trained in 28.03 seconds 60_NeuralNetwork logloss 0.976241 trained in 29.27 seconds 69_NearestNeighbors logloss 1.652493 trained in 23.4 seconds 25_LightGBM logloss 0.400252 trained in 21.61 seconds 16_Xgboost logloss 0.439921 trained in 24.78 seconds 34_CatBoost logloss 0.403254 trained in 37.62 seconds 43_RandomForest logloss 0.403334 trained in 36.33 seconds 52_ExtraTrees logloss 0.409987 trained in 34.3 seconds 61_NeuralNetwork logloss 0.911824 trained in 30.49 seconds 70_NearestNeighbors logloss 0.847123 trained in 24.23 seconds 26_LightGBM logloss 0.409352 trained in 21.49 seconds 17_Xgboost logloss 0.46196 trained in 23.23 seconds 35_CatBoost logloss 0.403134 trained in 36.35 seconds 44_RandomForest logloss 0.399452 trained in 35.27 seconds 53_ExtraTrees logloss 0.407137 trained in 35.21 seconds 62_NeuralNetwork logloss 0.960051 trained in 30.22 seconds 71_NearestNeighbors logloss 1.653601 trained in 24.74 seconds 27_LightGBM logloss 0.406186 trained in 26.0 seconds 18_Xgboost logloss 0.475936 trained in 24.12 seconds 36_CatBoost logloss 0.395098 trained in 38.38 seconds 45_RandomForest logloss 0.401126 trained in 38.59 seconds 54_ExtraTrees logloss 0.418287 trained in 34.39 seconds 63_NeuralNetwork logloss 0.67047 trained in 33.34 seconds 72_NearestNeighbors logloss 1.652493 trained in 28.84 seconds 28_LightGBM logloss 0.397014 trained in 26.25 seconds * Step mix_encoding will try to check up to 1 model 15_Xgboost_categorical_mix logloss 0.394184 trained in 30.49 seconds * Step golden_features will try to check up to 3 models None 10 Add Golden Feature: Parch_sum_SibSp Add Golden Feature: SibSp_sum_Pclass Add Golden Feature: SibSp_ratio_Parch Add Golden Feature: Pclass_diff_Parch Add Golden Feature: SibSp_ratio_Fare Add Golden Feature: SibSp_multiply_Pclass Add Golden Feature: Parch_multiply_SibSp Add Golden Feature: Parch_ratio_SibSp Add Golden Feature: SibSp_diff_Parch Add Golden Feature: Parch_multiply_Pclass Created 10 Golden Features in 13.19 seconds. 50_ExtraTrees_GoldenFeatures logloss 0.39349 trained in 56.81 seconds 15_Xgboost_GoldenFeatures logloss 0.39824 trained in 30.44 seconds 15_Xgboost_categorical_mix_GoldenFeatures logloss 0.396049 trained in 30.37 seconds * Step kmeans_features will try to check up to 3 models 50_ExtraTrees_KMeansFeatures logloss 0.391718 trained in 42.41 seconds 15_Xgboost_KMeansFeatures logloss 0.405067 trained in 33.91 seconds 15_Xgboost_categorical_mix_KMeansFeatures logloss 0.402308 trained in 35.31 seconds * Step insert_random_feature will try to check up to 1 model 50_ExtraTrees_RandomFeature logloss 0.398504 trained in 137.61 seconds Drop features ['Ticket_1601', 'Ticket_113781', 'Age', 'Ticket_347082', 'Name_rev', 'Ticket_2666', 'Ticket_ston', 'Name_dr', 'Name_robert', 'random_feature', 'Ticket_17755', 'Ticket_ca', 'Ticket_29106', 'Name_joseph', 'Name_william', 'Ticket_2343', 'Name_leonard', 'Name_kate', 'Name_peter', 'Name_johan', 'Name_skoog', 'Ticket_347088', 'Ticket_113760', 'Name_edward', 'Ticket_11767', 'Name_emily', 'Name_ivan', 'Name_martha', 'Ticket_17569', 'Ticket_237736', 'Name_palsson', 'Name_viktor', 'Ticket_2699', 'Ticket_347080', 'Name_arnold', 'Ticket_28403', 'Name_carl', 'Ticket_250647', 'Ticket_248738', 'Ticket_36947', 'Ticket_367230', 'Ticket_370129', 'Ticket_19943', 'Ticket_2668', 'Name_baclini', 'Name_hart', 'Name_van', 'Ticket_31921', 'Ticket_367226', 'Name_catherine', 'Ticket_364516', 'Ticket_110152', 'Ticket_34651', 'Ticket_36973', 'Ticket_220845', 'Name_marion', 'Ticket_230433', 'Ticket_2659', 'Name_vander', 'Name_nils', 'Ticket_349909', 'Ticket_345773', 'Name_daniel', 'Name_fortune', 'Name_walter', 'Ticket_36928', 'Ticket_17604', 'Ticket_17611', 'Ticket_17593', 'Ticket_19928', 'Ticket_17474', 'Name_stanley', 'Name_harry', 'Ticket_2691', 'Ticket_110413', 'Name_jane', 'Name_louise', 'Ticket_371110', 'Ticket_48871', 'Name_karl', 'Name_david', 'Ticket_347742', 'Name_hugh', 'Name_lefebre', 'Ticket_paris', 'Ticket_2678', 'Ticket_13529', 'Name_elias', 'Name_martin', 'Name_frank', 'Ticket_o2', 'Ticket_2653', 'Ticket_2665', 'Ticket_370365', 'Ticket_19996', 'Name_bertram', 'Ticket_24160', 'Name_gustaf', 'Name_ellen', 'Name_richards', 'Name_charles', 'Name_panula', 'Ticket_sc', 'Name_alice', 'Ticket_244252', 'Name_elsie', 'Name_matilda', 'Ticket_17558', 'Ticket_244367', 'Name_thayer', 'Ticket_349237', 'Ticket_2651', 'Ticket_2661', 'Ticket_2315', 'Name_hansen', 'Name_brown', 'Ticket_248727', 'Ticket_363291', 'Ticket_14879', 'Ticket_6608', 'Ticket_239853', 'Ticket_19950', 'Ticket_17582', 'Ticket_31027', 'Ticket_3336', 'Ticket_3101279', 'Name_boulos', 'Name_benjamin', 'Ticket_ah', 'Ticket_2079', 'Ticket_250655', 'Ticket_19877', 'Ticket_2673', 'Ticket_17761', 'Ticket_230136', 'Ticket_26360', 'Ticket_382652', 'Ticket_17572', 'Name_williams', 'Ticket_soton', 'Name_alfred', 'Name_sage', 'Ticket_347077', 'Name_sofia', 'Ticket_2123', 'Ticket_line', 'Ticket_17760', 'Ticket_9549', 'Name_kelly', 'Ticket_12749', 'Name_ernst', 'Name_johansson', 'Name_goodwin', 'Ticket_113789', 'Name_hans', 'Name_rice', 'Name_marie', 'Ticket_113505', 'Ticket_2144', 'Ticket_751', 'Ticket_6607', 'Name_norman', 'Ticket_17757', 'Ticket_37671', 'Name_percival', 'Ticket_392096', 'Ticket_54636', 'Ticket_17421', 'Name_francis', 'Name_victor', 'Name_augusta', 'Name_august', 'Ticket_230080', 'Name_jr', 'Name_samuel', 'Name_albert', 'Ticket_35273', 'Ticket_17758', 'Name_johnson', 'Name_alexander', 'Ticket_3101295', 'Ticket_oq', 'Name_margaret', 'Name_olsen', 'Name_bertha', 'Ticket_358585', 'Name_jensen', 'Name_elisabeth', 'Ticket_29750', 'Ticket_17485', 'Name_asplund', 'Ticket_243847', 'Ticket_17477', 'Name_ada', 'Ticket_250649', 'Ticket_3381', 'Ticket_33112', 'Ticket_239865', 'Ticket_13502', 'Name_thomas', 'Ticket_35281', 'Ticket_3101278', 'Ticket_347054', 'Name_florence', 'Name_patrick', 'Name_anne', 'Name_richard', 'Ticket_345764', 'Name_harper', 'Name_edith', 'Name_gustafsson', 'Name_hanna', 'Name_smith', 'Ticket_364849', 'Ticket_7534', 'Ticket_111361', 'Ticket_2627', 'Ticket_250644', 'Ticket_231919', 'Name_emil', 'Name_katherine', 'Name_andrew', 'Ticket_17608', 'Name_sidney', 'Ticket_2908', 'Ticket_376564', 'Name_oskar', 'Name_bourke', 'Ticket_17453', 'Ticket_113572', 'Ticket_113803', 'Ticket_110465', 'Name_douglas', 'Name_ford', 'Name_henry', 'Ticket_4133', 'Name_anna', 'Name_john', 'Name_frederick', 'Name_carter', 'Name_ernest', 'Name_harris', 'Name_james', 'Name_helen', 'Name_arthur', 'Name_annie', 'Name_elizabeth', 'Name_mary', 'Name_george', 'Ticket_pp', 'Ticket_pc', 'Parch', 'Name_andersson', 'Name_maria'] * Step features_selection will try to check up to 6 models 50_ExtraTrees_SelectedFeatures logloss 0.396132 trained in 35.84 seconds 15_Xgboost_SelectedFeatures logloss 0.404208 trained in 34.39 seconds 36_CatBoost_SelectedFeatures logloss 0.402618 trained in 41.61 seconds 28_LightGBM_SelectedFeatures logloss 0.406602 trained in 40.25 seconds 40_RandomForest_SelectedFeatures logloss 0.406838 trained in 41.95 seconds 57_NeuralNetwork_SelectedFeatures logloss 0.425396 trained in 33.05 seconds * Step hill_climbing_1 will try to check up to 32 models 73_ExtraTrees logloss 0.390712 trained in 41.55 seconds 74_ExtraTrees logloss 0.401999 trained in 41.04 seconds 75_ExtraTrees logloss 0.38248 trained in 40.55 seconds 76_ExtraTrees logloss 0.400482 trained in 41.51 seconds 77_ExtraTrees_GoldenFeatures logloss 0.387611 trained in 40.97 seconds 78_ExtraTrees_GoldenFeatures logloss 0.402254 trained in 47.96 seconds * Step hill_climbing_2 will try to check up to 30 models 79_ExtraTrees logloss 0.384233 trained in 45.27 seconds 80_ExtraTrees_GoldenFeatures logloss 0.386553 trained in 42.1 seconds 81_ExtraTrees logloss 0.391268 trained in 39.77 seconds 82_Xgboost logloss 0.394635 trained in 33.64 seconds 83_Xgboost logloss 0.391803 trained in 34.54 seconds 84_Xgboost logloss 0.39389 trained in 34.75 seconds 85_Xgboost logloss 0.392496 trained in 34.79 seconds * Step boost_on_errors will try to check up to 1 model 75_ExtraTrees_BoostOnErrors not trained. Force to stop the training. Total time for AutoML training already exceeded. * Step ensemble will try to check up to 1 model Ensemble logloss 0.373248 trained in 5148.8 seconds Skip stack because no parameters were generated. Skip ensemble_stacked because no parameters were generated. AutoML fit time: 32704.8 seconds AutoML best model: Ensemble
df_eval["Survived"] = automl.predict(df_eval)
!/Users/hinomaruc/Desktop/blog/my-venv/bin/kaggle competitions submit -c titanic -f titanic_submission.csv -m "model #011. mljar パターン2"
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label="Survived", problem_type="binary",path="RESULT_AUTOGLUON").fit(X_train, time_limit = 600)
Beginning AutoGluon training ... Time limit = 600s AutoGluon will save models to "RESULT_AUTOGLUON/" AutoGluon Version: 0.4.2 Python Version: 3.8.13 Operating System: Darwin Train Data Rows: 891 Train Data Columns: 11 Label Column: Survived Preprocessing data ... Selected class <--> label mapping: class 1 = 1, class 0 = 0 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 10859.32 MB Train Data (Original) Memory Usage: 0.31 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Note: Converting 1 features to boolean dtype as they only contain 2 unique values. Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Fitting TextSpecialFeatureGenerator... Fitting BinnedFeatureGenerator... Fitting DropDuplicatesFeatureGenerator... Fitting TextNgramFeatureGenerator... Fitting CountVectorizer for text features: ['Name'] CountVectorizer fit with vocabulary size = 8 Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('float', []) : 2 | ['Age', 'Fare'] ('int', []) : 4 | ['PassengerId', 'Pclass', 'SibSp', 'Parch'] ('object', []) : 4 | ['Sex', 'Ticket', 'Cabin', 'Embarked'] ('object', ['text']) : 1 | ['Name'] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 3 | ['Ticket', 'Cabin', 'Embarked'] ('float', []) : 2 | ['Age', 'Fare'] ('int', []) : 4 | ['PassengerId', 'Pclass', 'SibSp', 'Parch'] ('int', ['binned', 'text_special']) : 9 | ['Name.char_count', 'Name.word_count', 'Name.capital_ratio', 'Name.lower_ratio', 'Name.special_ratio', ...] ('int', ['bool']) : 1 | ['Sex'] ('int', ['text_ngram']) : 9 | ['__nlp__.henry', '__nlp__.john', '__nlp__.master', '__nlp__.miss', '__nlp__.mr', ...] 0.8s = Fit runtime 11 features in original data used to generate 28 features in processed data. Train Data (Processed) Memory Usage: 0.07 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.92s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric parameter of Predictor() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 712, Val Rows: 179 Fitting 13 L1 models ... Fitting model: KNeighborsUnif ... Training model for up to 599.07s of the 599.07s of remaining time. 0.6536 = Validation score (accuracy) 0.12s = Training runtime 0.05s = Validation runtime Fitting model: KNeighborsDist ... Training model for up to 598.87s of the 598.87s of remaining time. 0.6536 = Validation score (accuracy) 0.07s = Training runtime 0.03s = Validation runtime Fitting model: LightGBMXT ... Training model for up to 598.75s of the 598.74s of remaining time. 0.8156 = Validation score (accuracy) 2.66s = Training runtime 0.02s = Validation runtime Fitting model: LightGBM ... Training model for up to 596.04s of the 596.04s of remaining time. 0.8212 = Validation score (accuracy) 0.53s = Training runtime 0.02s = Validation runtime Fitting model: RandomForestGini ... Training model for up to 595.48s of the 595.47s of remaining time. 0.8156 = Validation score (accuracy) 1.31s = Training runtime 0.1s = Validation runtime Fitting model: RandomForestEntr ... Training model for up to 594.01s of the 594.01s of remaining time. 0.8156 = Validation score (accuracy) 0.96s = Training runtime 0.13s = Validation runtime Fitting model: CatBoost ... Training model for up to 592.86s of the 592.85s of remaining time. 0.8268 = Validation score (accuracy) 1.71s = Training runtime 0.02s = Validation runtime Fitting model: ExtraTreesGini ... Training model for up to 591.12s of the 591.12s of remaining time. 0.8101 = Validation score (accuracy) 1.03s = Training runtime 0.11s = Validation runtime Fitting model: ExtraTreesEntr ... Training model for up to 589.93s of the 589.92s of remaining time. 0.8101 = Validation score (accuracy) 1.01s = Training runtime 0.11s = Validation runtime Fitting model: NeuralNetFastAI ... Training model for up to 588.73s of the 588.72s of remaining time. No improvement since epoch 9: early stopping 0.8268 = Validation score (accuracy) 7.76s = Training runtime 0.04s = Validation runtime Fitting model: XGBoost ... Training model for up to 580.89s of the 580.88s of remaining time. 0.8101 = Validation score (accuracy) 0.8s = Training runtime 0.02s = Validation runtime Fitting model: NeuralNetTorch ... Training model for up to 580.05s of the 580.04s of remaining time. 0.8492 = Validation score (accuracy) 8.53s = Training runtime 0.04s = Validation runtime Fitting model: LightGBMLarge ... Training model for up to 571.47s of the 571.47s of remaining time. 0.8324 = Validation score (accuracy) 1.61s = Training runtime 0.02s = Validation runtime Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 568.72s of remaining time. 0.8603 = Validation score (accuracy) 0.89s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 32.25s ... Best model: "WeightedEnsemble_L2" TabularPredictor saved. To load, use: predictor = TabularPredictor.load("RESULT_AUTOGLUON/")
df_eval["Survived"] = predictor.predict(df_eval)
!/Users/hinomaruc/Desktop/blog/my-venv/bin/kaggle competitions submit -c titanic -f titanic_submission.csv -m "model #011. autogluon パターン2"
# モデル作成
import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, Y_train)
ValueError: Input Column Name has invalid type object.
Cast it to a valid dtype before using it in Auto-Sklearn. Valid types are numerical, categorical or boolean.
df_eval["Survived"] = cls.predict(df_eval)
!/Users/hinomaruc/Desktop/blog/my-venv/bin/kaggle competitions submit -c titanic -f titanic_submission.csv -m "model #011. autosklearn パターン2"