Data preparation for susceptibility mapping-Part I

For susceptibility mapping, you need to extract point values (for both presence and absent like landslide location with value 1 and non-landslide location with value 0) for causing factor raster maps in ArcGIS. Then export those values as .txt file.

import pandas as pd

read_file = pd.read_csv ('/work/land_poi.txt',delimiter=',') #read .txt file as csv read_file.to_csv ('/work/land_poi.csv', index=None) #convert and save it to as csv file

df_land=pd.read_csv('land_poi.csv') #read csv file df_land.head()

#change the name of all columns df_land.columns=['ID','Land_poi','X','Y','DEM','ASPECT','SPI','LULC','GEO','FAULT','RAIN','RIVER_BUFF','ROAD_BUFF','CURVA','TWI','SLOPE']

df_land.head() #names are changed now

# we have to add a new column for dependent factor landslide presence with value of 1 #add new column landslide df_land['Landslide']=1 df_land.head()

df_land=df_land.drop(['ID'], axis = 1) #drop extra column

df_land.to_csv ('/work/land_poi.csv', index=None) # now save the changed dataframe to a csv again, this will overwrite the exisiting one

#read that newly saved csv file df_land=pd.read_csv('/work/land_poi.csv') df_land.head() # all changes made are now saved in csv file

# now upload the txt file for non-landslide point values extracted from all factor raster maps read_file = pd.read_csv ('/work/non_land_poi.txt',delimiter=',') read_file.to_csv ('/work/non_land_poi.csv', index=None)

df_non_land=pd.read_csv('/work/non_land_poi.csv') df_non_land.head()

#rename columns to same name as df_land data frame columns because later on they will be joined together df_non_land.columns=['ID','Land_poi','X','Y','DEM','ASPECT','SPI','LULC','GEO','FAULT','RAIN','RIVER_BUFF','ROAD_BUFF','CURVA','TWI','SLOPE']

df_non_land.head()

# we have to add a new column for dependent factor landslide absence with value of 0 df_non_land['Landslide']=0 # column name should be the same as in df_land df_non_land.head()

df_non_land=df_non_land.drop(['ID'], axis = 1) #drop extra column

df_non_land.to_csv ('/work/non_land_poi.csv', index=None)

#read newly saved csv file df_non_land=pd.read_csv('/work/non_land_poi.csv') df_non_land.head()

df_land_nonland = pd.concat([df_land,df_non_land], ignore_index=True, sort=False) df_land_nonland

Now all points values for landslide, non-landslide are in one data frame. Last column (Landslide) with 0 and 1 values are also stacked together. You can check total no of rows and see if they are equal to sum of landslide, non-landslide data frames.

df_land.info() # we got 120 points

df_non_land.info() # 120 so combining these two we will get 240 rows

df_land_nonland.info()

#shuffle this dataframe import numpy as np # # shuffle the DataFrame rows df_land_nonland = df_land_nonland.sample(frac = 1) # # print the shuffled DataFrame print("\nShuffled DataFrame:") print(df_land_nonland)

#save this combine dataframe to a new csv file df_land_nonland.to_csv ('/work/df_land_nonland.csv', index=None)

#read newly saved csv file of combine dataframe df_land_nonland=pd.read_csv('/work/df_land_nonland.csv') df_land_nonland.head()

So, we now have a data frame for landslide points and non-landslide points. This data frame will be used for training machine learning model. For prediction, we need all these factors values for all the study area (each pixel value), and then we will do the prediction for that data frame based on training data frame. Data preparation for prediction data frame will be covered in the next tutorial.