Pre-procesado de datos
#we load the libraries
import numpy as np
import pandas as pd
import tensorflow as tf
2021-09-24 22:23:30.363617: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-09-24 22:23:30.363676: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
df = pd.read_csv('/work/Churn_Modelling.csv')
df.head()
# we define X and y variables
X = df.iloc[:,3:-1].values
y = df.iloc[:,-1].values #columna de 'exited' clients
print(y)
[1 0 1 ... 1 1 0]
Defining category variables
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,2] = le.fit_transform(X[:,2]) #dividimos por categorias binarias 0 = male, 1 = female
df.dtypes.unique()
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [1])], remainder='passthrough') #clase que tiene metodo fit transform para usarlo despues en la columna que vamos a codificar [1]= paises
X = np.array(ct.fit_transform(X)) #aplicamos
Separating data for traning and testing
Train test nos regresa 4 arreglos, le pasamos nuestros datos, el porcentaje que le vamos a dar para test (20% test-80%training), para que lo haga random o no.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) #dividimos datos para entrenamineto y para testeo
Reescalamiento de datos, hay que normalizar los datos para evitar valores extremos que afecten los pesos.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Build of neural network
ann = tf.keras.models.Sequential() #inicializando red neuronal
2021-09-24 23:10:13.792492: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-24 23:10:13.796127: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-09-24 23:10:13.796178: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-09-24 23:10:13.796213: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (p-19a75e26-5d06-451b-8f78-354167a71c3c): /proc/driver/nvidia/version does not exist
2021-09-24 23:10:13.796637: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-24 23:10:13.796876: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
ann.add(tf.keras.layers.Dense(units = 7, activation = 'relu')) #anadiendo capa de entrada y capa escondida
ann.add(tf.keras.layers.Dense(units = 7, activation = 'relu')) #anadiendo segunda capa escondida
ann.add(tf.keras.layers.Dense(units = 1, activation = 'sigmoid')) #anadiendo capa de salida
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['accuracy']) #croos entropy
ann.fit(X_train, y_train, batch_size=32, epochs=20) #epoch=cada vez que nuestros datos pasan por la red neuronal
2021-09-24 23:21:01.490400: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-09-24 23:21:01.502827: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2199720000 Hz
Epoch 1/20
250/250 [==============================] - 2s 2ms/step - loss: 0.6592 - accuracy: 0.6317
Epoch 2/20
250/250 [==============================] - 1s 2ms/step - loss: 0.5111 - accuracy: 0.7936
Epoch 3/20
250/250 [==============================] - 1s 2ms/step - loss: 0.4641 - accuracy: 0.7913
Epoch 4/20
250/250 [==============================] - 1s 2ms/step - loss: 0.4230 - accuracy: 0.8056
Epoch 5/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3967 - accuracy: 0.8294
Epoch 6/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3793 - accuracy: 0.8437
Epoch 7/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3568 - accuracy: 0.8510
Epoch 8/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3619 - accuracy: 0.8479
Epoch 9/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3511 - accuracy: 0.8605
Epoch 10/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3481 - accuracy: 0.8581
Epoch 11/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3466 - accuracy: 0.8567
Epoch 12/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3536 - accuracy: 0.8561
Epoch 13/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3524 - accuracy: 0.8528
Epoch 14/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3374 - accuracy: 0.8655
Epoch 15/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3508 - accuracy: 0.8595
Epoch 16/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3496 - accuracy: 0.8572
Epoch 17/20
250/250 [==============================] - 1s 3ms/step - loss: 0.3422 - accuracy: 0.8600
Epoch 18/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3488 - accuracy: 0.8567
Epoch 19/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3518 - accuracy: 0.8583
Epoch 20/20
250/250 [==============================] - 1s 2ms/step - loss: 0.3367 - accuracy: 0.8605
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))
[[0 0]
[0 1]
[0 0]
...
[0 0]
[0 0]
[0 0]]
Evaluation of the model
from sklearn.metrics import accuracy_score, confusion_matrix
accuracy_score(y_test,y_pred) #vemos cual es el porcentaje de precision del modelo
confusion_matrix(y_test, y_pred)