AI Project Report - Udacity Driving Simulator
To read this notebook in a better format go to this link: https://deepnote.com/@report/AI-dfa470f5-a632-4756-a7b6-1c6dadddfdf2
In this document you’ll find our report for AI Project – Autonomous Driving Car. Our group will use two different methods to train and test a model, which can then self-drive a vehicle in a car racing game simulator in Udacity. The first method will be Open CV and the second method is to use deep learning. We will display all our successes, failures, and frustrations alongside everything we’ve learned, and improved upon in this project.
We first began with Open CV image filtering techniques. We gathered lots of data (images) using the simulators integrated recording tools, which was quite convenient for us. We initially used grayscale, canny detection, and gaussian blur then proceeded to use HoughLines() function from Open CV which automatically detected the lines. Using image filtering was quite tedious at first but we managed to make it detect the boarders of the track with a lot of trial and error and had our model drive on its own.
Afterwards we used deep learning techniques in combination of Keras and Flask. Keras trained our model quite well and was able to complete a full lap much easier than Open CV, although it did take a considerable amount of time to train. Flask enabled us to control the simluated car with some basic functions.
Open CV
We import a variety of libraries to help us with the Open CV part. The most important, however, is obviously the cv2 library as this gives us many powerful tools to apply different image adjustments.
Image Processing
To begin we created two new methods that will help us with the processing of our images, before we even begin to try and detect the lines at the sides of the track.
We define a function named roi() that takes two arguments: video (our image) and vertices. The function is created to help with the processing of images later on by applying a region of interest (ROI) mask to an image. The first line creates a mask of zeros with the same dimensions as the input video. The cv2.fillPoly() function is then used to fill the mask with white pixels at the coordinates specified by vertices (which are defined later on). This creates a mask that is white for the region of interest and black everywhere else. Combining this with our canny edge detection, we should only see white lines within this ROI. The cv2.bitwise_and() function is then used to apply the mask to the input video. This results in the original image being masked, with everything outside of the region of interest set to black.
Here we define another function named mask() that takes in the video (image) parameter. The function is meant to apply a color mask to the input image to filter out all colors except white and yellow. The first four lines define the lower and upper bounds for the white and yellow colors that we want to keep in the image. These values are specified in the HSL (Hue, Saturation, Lightness) color space as arrays of three uint8 values representing the hue, saturation, and lightness of the colors. The cv2.inRange() function is then used to create two masks, one for white and one for yellow, using the specified lower and upper bounds. These masks are binary images where white pixels represent the colors within the specified range and black pixels represent all other colors. The two masks are then added together, resulting in a combined mask that filters out all colors except white and yellow. The cv2.bitwise_and() function is then used to apply this combined mask to the input image.
This is the overall pipeline created to process the image stream (or video) before applying line detection. We will go over each line and explain the different methods we used to process our image into a usuable processed image for line detection using Canny edge detection.
The function begins by defining a set of 8 vertices that represent the corners of a polygon. This polygon is used as a region of interest (ROI) with the roi() function created earlier. Next, the image is converted from the default BGR color space to the HLS color space using the cv2.cvtColor function. The image is then passed through our previously created mask() function to filter out the different colors besides yellow and white.
We pass the image through our roi() function, which applies the previously defined polygon as a mask to the image, effectively cropping the image to the region within the polygon.
After that, we blur the image using a Gaussian blur with a kernel size of 5x5 using the cv2.GaussianBlur function, to reduce noise and smooth out the image. We then specify the standard deviation along the x and y axis.
The image is then passed through the Canny edge detector, which is used to find edges in an image. The edge detector uses two threshold values to decide which pixels are edges and which are not. In this case, the threshold values are set to 30 and 100. Finally, the image is overlaid with a circle of 10 pixels as its radius, at the center of the image using the cv2.circle function.
Here we are grabbing a screenshot of the screen and then processing it. The bbox parameter specifies the dimensions of the area of the screen to capture. The processed image is then stored in a variable.
Determining Lines
To determine the lines and calculate the distance the car has to remain from the sides we use a host of different functions which are called within each other to make the eventual pipeline easier. To explain all of the code we will be going through the different functions code snippets in order as they are used to try and explain their functionality.
First we grab the image using the earlier discussed ImageGrab and afterwards we process the image with our plethora of filters.
Next, we use the HoughLinesP method to detect lines in the image that have a minimum length of 10 pixels and a maximum gap of 15 pixels between adjacent points on a line. The detected lines will be returned by the method as an array of points defining the lines. The original idea of the HoughLinesP method is to use the Hough Transform, which is aimed at identifying straight lines. This is also the reason why we went through such exhaustive image processing steps beforehand as Hough Transform usually works best after we have detected the edges using something like Canny edge detection. Hough Lines work in something called Hough space which usually uses the polar system, having different parameters to describe a line than the simple end points. The two new parameters of a straight line are (ρ, θ), with ρ being the shortest distance from the origin to the line (perpendicularly) and θ being the angle between the X-axis and the distance line. This does allow us to describe vertical lines by just two parameters, compared to using only Cartesian parameters.
Technically we have now detected the lines, however, these are all of the different lines that could be present within the image that satisfy the Hough Lines parameters. Therefore we will now make sure that information gleaned from these lines is also processed into average lines which are hopefully those on the side of the road. On top of that we also make sure to display the lines on a different image so that the user can track this applications' performance in real time.
The ensuring_screen_shows(lines, screen1) function checks whether or not the lane_lines(lines) function has a proper return of average lines to use. If it does not then the current image is simply passed along to the display while it checks the next image. If it turns out that there are valid average lines in the image then this function will call the draw_lane_lines (screen1, lines) function to place the lines on to the image.
We take a set of lines as input and return the coordinates of the two lane lines that best fit the set of lines. It first checks if the lines given as a parameter are actually valid and if the lines are valid, the function tries to calculate the average slope and intercept using the average_slope_intercept(lines) function.
We first instantiate some variables to keep track of the lines and their weights followed by iterating over each line in the lines variable. For each line, we calculate the slope and intercept using the coordinates of the two endpoints. We also try to calculate the length of the line using the Pythagorean theorem. Now we need to check if the line is a left line or a right line based on the slope and the x-coordinates of the endpoints of the line. If the line is a left line, it adds the line's slope and intercept to the list of left lines, and its length to the left weights. It does this the same way for the right line. After all the lines have been processed, we calculate the average slope and intercept of the left lines and the right lines using the np.dot() function. We follow this up by using a moving average filter on the calculated averages to smooth them out.
draw_lane_lines() takes an image and a set of lines, and then draws the lines on the image so that the user of this application is capable of following what the car is seeing. We first create a black image with the same size as the original image and then iterate over each line in passed lines array. For each line, it checks if the line is valid and not empty (to handle the case where an empty lines array was provided), and then draws the line on the black image. Finally, we finish the image with cv2.addWeighted() to combine the original image and the image with the drawn lines.
We created a simple function to help us with the problem of the many average lines which takes three arguments: the current average value (avg), the latest added line value (new_value), and the number of lines to average over (N). If the current average value is 0, the function simply returns the latest line value. Otherwise, it uses the following formula to calculate the new average value:
avg -= avg / N avg += new_value / N
This formula subtracts a fraction (1/N) of the current average value from the current average value, and adds a fraction (1/N) of the latest sample value to the current average value. The resulting value is the new average value. For example, if the current average value is 10, the latest value is 5, and N is 5, the new average will be calculated as follows:
avg = 10 avg -= avg / 5 avg += 5 / 5 avg = 10 - 2 + 1 avg = 9
To create the lines on our image for the user to follow we need to figure out each endpoint of the two separate lines. make_line_points() does exactly that by taking two Y-coordinates and a line list of either a left or right line, it however does first check if the line is valid and not empty. This will then slice the these lists into separate values as coordinates. Since cv2.line() (the function we use to create the lines on the image) requires the coordinates to be integers we also apply a int() transform just to be safe. We can acquire the X-coordinate by dividing the minus of the Y-coordinate and intercept by the slope of the line.
Now that we have found the different lines on the side of the road it's time to figure out where the different X-coordinates are in perspective to the middle X-coordinate of our car so that we my control the distance between the lines and our car.
The ensuring_lanes() function is relatively simple and functions mostly as a type of gate; Checking whether or not the current calculated lines are valid and or exist. If they do indeed exist it will call the function middle_xcoordinate(). If they do not exist the X-coordinates will be given None or empty values so that other functions can't work until they're overridden with proper values.
The function above calculates the average slope and intercept both lines using the previously explained average_slope_intercept() function. Then, we define a Y-coordinate that corresponds to the middle of the image. Next, we use the get_middle_xcoordinate() function to calculate the X-coordinate of the middle point of both lines. Afterwards, we just return both lines their X-coordinates.
The function first checks if the line is valid and not None. If the line isn't valid, our function will return None. Otherwise, it unpacks the slope and intercept values from the line, and calculates the X-coordinate of the point on the line that has the same Y-coordinate as the input. We then calculate the X-coordinate much like how we calculated the X-coordinates earlier from the outer points of the line. In the end we convert the X-coordinate to an integer manually to be certain, as the cv2.line() function requires it.
Originally much of our code was calibrated using the testing images we received the Udacity simulator itself which positions the camera in front of the car. Since we could not get Flask to work properly using OpenCV we simply had to work around the fact that in the middle of the screen our car remains. Even then, we are happy with the performance of our program.
Driving the Car
This is a general group of functions to control the car with. The ReleaseKey() and PressKey() come from a separate Python file that we found online which incorporates C code. It was a bit too advanced of an inclusion to try to write ourself.
We implement a function to control the car by using the two outward lanes (or in our case lines). The left middle and right middle X-coordinate are used to determine whether the car is aligned with the lines or whether it has to turn towards one of the lines. If the difference between the positions of the lines is within the limit we instantiated then the car will continue driving straight using the previously shown straight() function. If the difference between the two lines is greater than the upper limit we instantiated then the car needs to turn right as it is nearing the left lane. When it is smaller than the lower limit then the car is nearing the right side of the road (towards the right lane) and needs to turn left using the previously created left() function. During each of these maneouvers we also add a slow_down() function to make sure the car doesn't fly out of the road.
Much like the previous control function, this function allows us to control the car by looking at only one line, notably the left line. This function should only be activated once the right line is not valid. The idea remains the same, at the start we instantiate some limits which are used to determine how aligned our car is to this line or whether it needs to turn left/right. If the position of the left line is within the range between the two limits we can assume the car is still going straight, aligned with the lines on either side of the road and therefore we will continue going straight. If the position is great than the upper limit the vehicle would be veering too much towards the right side and we will call the right() function to turn it slight right. It is the opposite for the lower limit check.
This is basically the same function as the left_lane_control() but obviously targeted towards the singular right line. We once again instantiate the limits and check for the position to see whether or not it's aligned with the known line. If the position of the car is higher than the upper limit we will tell it to turn right as it is too far away from the line, meaning it is veering towards the left side of road. If the position is smaller than the lower limit it means the car is going too close to the right side of the road and we will tell it to turn left using the left() function.
Deep Learning
After completing the Open CV method of this project, we moved to the Deep Learning method where we train a model with a CNN network, that will control the car and drive around the track. In this part you will find how we gathered the data used to train the model, the strategy and method of how we trained our model, and finally how we used that model to actually control the Udacity simulator.
Data Gathering
Udacity’s driving simulator has a practical tool of recording data for you built right into the application. We simply selected a folder, and hit record. The data was then based on the manual driving we did on the track. After a few rounds we decided to train a model but soon realized that this simple method wouldn’t have enough variety to train the model successfully. Thus after recording simple laps on the track we decided to add more data in terms of the tricky parts: bends, sharp turns, missing lines on the track, and most difficult to recognize were shadows that are casted on the road from objects such as trees and boulders. Driving slower at the locations mentioned on the track proved to help the model understand what it needed to do better. Driving backwards on the track also improved our model as it added a new perspective to the data. However, what we found particularly useful was to drive a few laps hugging each side of the lane on the left and on the right as tightly as possible as to set certain “boundaries” for the model to recognize. Doing so proved to drastically improve the model which then flawlessly completed the lap.
Training the Model
After recording the training data on the track that we want the car to drive, its time to train the model. The main strategy of training the model is for the model to predict a number that represents the angle that the car should drive to based on an image. When the car is driving, it takes the current frame, passes it to the model where the angle is then produced, then that angle is sent back to the controls.
After all the modules are imported, its time to process the training data that was gathered in the data gathering process. The data gathering process produces 2 items, a folder with all the images and a driving_log.csv file that has all the metadata of the image. First, all the metadata of the images are loaded into a DataFrame and labeled correctly. Then the DataFrame is set to have unlimited column width to make sure that none of the data is removed.
In the driving_log.csv, the center, left and right values are full paths of the image that are connected to it. Because this information is redundant, they are removed. This is done by first making a function called `remove_head` which takes the last value of the original string which is split by "\", this returns only the relevant file name of the image.
Then a histogram of the steering angles in the data DataFrame is made. This is done by using the histogram() function to generate a histogram of the steering angles in the data DataFrame. The histogram() function takes the data to be plotted and the number of bins to use in the histogram as inputs, and it returns the histogram data and the bin edges as outputs. Then the center of the bins are computed by taking the average of the bin edges and assigns the resulting array to the center variable. Next, using the matplotlib bar() function the histogram data is plotted as a bar chart.
After the histogram has been plotted, the code uses the matplotlib plot() function to draw a horizontal line across the plot. The plot() function takes the x-coordinates and y-coordinates of the line as inputs, in which the minimum and maximum steering angles in the data DataFrame are used as the x-coordinates of the line. The code uses the samples_per_bin variable to specify the y-coordinate of the line.
Then we remove the excess data from the data DataFrame in order to balance the distribution of steering angles in the data. First the total number of data points in the data DataFrame is printed. Then a for loop is used to iterate over the bins in the histogram and select a random subset of data points in each bin that have steering angles within the bin range. The selected data points are then added to a list of data points to be removed, and the excess data points are removed from the data DataFrame using the pandas drop() method.
Finally, the numpy histogram() function is used to generate a new histogram of the steering angles in the modified data DataFrame. It then uses the matplotlib bar() and plot() functions to plot the histogram and a horizontal line at the desired number of samples per bin, respectively. This allows the us to visualize the new distribution of steering angles in the data DataFrame and verify that the data has been balanced.
Here the paths to the images and steering angles from the data DataFrame are extracted. Then a load_img_steering() function that takes the directory containing the images and the data DataFrame as inputs, and returns the paths to the images and the steering angles as arrays is defined.
The function begins by initializing empty lists called image_path and steering that will be used to store the paths to the images and the steering angles, respectively. Then a for loop is used to iterate over the rows of the data DataFrame, and for each row it extracts the paths to the center, left, and right images and the corresponding steering angle.
For each image, the code uses the os.path.join() function to combine the directory containing the images (specified by the datadir parameter) with the path to the image and append the result to the image_path list. It then adds the corresponding steering angle to the steering list.
For the left and right images, the code adds an additional offset to the steering angle to account for the camera offset. This offset is specified as a constant value of 0.15 in the code.
After all of the rows in the data DataFrame have been processed, the image_path and steering lists are converted to numpy arrays using the numpy asarray() function and returns the arrays as the output of the load_img_steering() function, this is done so that is compatible for the training format later.
Finally, the load_img_steering() function is used and stores the returned arrays in the image_paths and steerings variables. These arrays contain the paths to the images and the corresponding steering angles, respectively. This will allow us to process the training data, before actually loading the images.
Then the the previously made variables image_paths and steerings are split into validation and training set with the ratio of 0.2 and 0.8 respectively. The image_paths are treated as the training data while the steerings are treated as the valid output to represent the fact that the model has to predict the right steering angle from the images that will be fed to it.
To ensure that the data are evenly split and to reduce bias in the model, the histograms of both the training and validation sets are plotted and examined. This allows us to train the model on the cleanest data possible.
Next, a few functions are made that augments the images. They take the image as a parameter, does some alterations to it and returns the altered image.
This function defines a zoom() function that uses the imgaug library to zoom in on an image. The zoom() function takes an image as input and returns the zoomed-in version of the image.
This function defines a pan() function that uses the imgaug library to pan an image. The pan() function takes an image as input and returns the panned version of the image. The pan() function first creates an Affine object called pan. The Affine class is used to apply a translation (pan) on the image to the values of x and y. The pan is then applied to the image and returned back.
This code defines an img_random_brightness() function that uses the imgaug library to randomly adjust the brightness of an image. The function takes an image as input and returns the modified version of the image.
This code defines an img_random_flip() function that uses the cv2 library to randomly flip an image horizontally. It takes an image and a steering angle as inputs and returns the flipped version of the image and the corresponding flipped steering angle.
Finally a function is made to applies a series of random image augmentation techniques to an input image. It takes an image and a steering angle as inputs and returns the augmented version of the image and the corresponding steering angle. First the image is loaded in using mpimg.imread() which takes the path to the image file as input and returns the image data as output. Then for every augmentation function that was defined previously, there is a 50% chance that it will be applied to the current image.
After all the data augmentation functions are set up, another function is made to process the image to a more suitable for training the model. First the function takes the image and gets right dimensions for it. Then it converts it to a YUV color space, this is done to provide a better luminance and chrominance resolution, and is often used is computer vision. The image is then blurred using a 3x3 Gaussian kernel. Blurring the image helps to reduce noise and improve the performance of the machine learning model. Next, the code uses the cv2.resize() function to resize the image to a width of 200 pixels and a height of 66 pixels. This size is commonly used in deep learning models for autonomous driving because it provides a good balance between resolution and computational efficiency. Finally, the code normalizes the pixel values of the image by dividing the pixel values by 255. This ensures that the pixel values are in the range [0, 1], which is the range that most deep learning models expect.
This code defines a generator function, batch_generator, that yields batches of images and corresponding steering angles.
The generator first initializes empty lists for the images and steering angles in the current batch. It then enters an infinite loop that generates a random index for one of the images in the input list, reads the image from the specified path, and processes the image using the random_augment function if the generator is being used for training, or the just a normal image load function if the generator is being used for validation/testing. The processed image is then processed using the img_process, then the image and the corresponding steering angle are added to the current batch. When the batch is full, the generator yields the batch as a tuple of NumPy arrays. This process is then repeated for subsequent batches.
This allows the training process to produce an infinite number of training images in specified batch sizes. It also allows the model to be trained on a very large dataset without having to load the entire dataset within memory, instead only the size of the batch size needs to held in memory as the data is generated per batch. A more in depth explanation will be provided on the training part of the model.
After all the necessary functions needed are defined, its time to build the model. For the convolution layers a model similar to how NVIDIA used to train their own self driving cars are used. After a lot of experimentations it turns out that this is the convolution layer that produces the best model.
After the convolutional layers, the code flattens the output of the previous layers into a one-dimensional vector and feeds it into a series of dense, fully-connected layers. These dense layers apply non-linear transformations to the data to learn complex patterns in the input data. The output of the final dense layer is a single value that represents the predicted steering angle.
The model is then compiled. It uses an mean root squared error as its loss function as the model is trying to predict a steering angle which is a number, similar as a regression problem. A low learning rate is also used to try and improve the models accuracy.
The model is then trained. First the batch generator function that was previously made is given for the training set, with the X_train and y_train variables which represents the array that contains the image paths and their respective angles.
A batch size of 100 is then set which means that 100 image is yielded by the generator function per step of the epoch. In this case because the steps per epoch is set to 300, that means that there are 300 batches of 100 images per epoch. This is why a generator function is used to produce the training and validation data. Using a generator function allows the batch_generator function to generate the data one batch at a time, which means that only a small amount of data needs to be stored in memory at any given time. This makes it possible to use a large amount of images to train the model without having to worry about running out of memory.
The same applies to the validation set, where the previously defined X_valid and y_valid is used with a batch size of 100 as parameters of the batch generator. The istraining parameter of the function is set to 0 to prevent augmentation to the image and a validation step of 200 is used, also to make sure that the amount of images loaded into memory are not to big.
The shuffle is set to 1 or True which means that the batch should shuffled before each epoch. And the Early stopping will allow the model to train up to 100 epochs unless the val_accuracy of the model goes down 3 times which is indicated by the patience value. This is done to prevent overfitting and that the model achieved is the best possible version it can be.
To evaluate how the training went, the loss of the training and the validation dataset is plotted into a graph.
Because we trained this model countless time, a dynamic way for naming the model is used. The model is then saved and ready to be used to drive the car!
Driving the Car with the Model
Now that the model is ready, we need it to be able to interact with our Udacity Simulator. To do this we are using flask to both control the car, and get the input image that is used for the car to predict the angle.
First the necessary modules are imported.
Second, a new instance of the Server class from the socketio module is made. This Server instance is then assigned to the sio variable. This allows real-time, bidirectional communication between a the python code and the Udacity simulator.
Then a Flask app instance instance is made and the speed limit is set to 20 to make sure that the model does not go to fast which might result in crashing.
This is the main part of the code that allows the model to control the car. On every frame that the script receive, this function is run. It first gets the current speed of the car and saves it as a float. Then, it gets the image and converts it to an image object as the frame is originally sent in a Base64 format. This is finally converted into a numpy array where it can finally be used by the model to predict at which angle the car should drive to.
After getting the angle that the car should drive towards, the throttle is calculated. This is done by subtracting 1 by speed divided by the speed limit. The angle, throttle and speed is then printed out to the terminal for ease of use and the angle and throttle is sent to the simulator. This is done multiple times per second which results in a continuously smooth driving car.
On connect the terminal is updated to let us know that the script is ready to use and the car is ready to drive. The car is also instructed to drive at an angle of 0 with a throttle of 0 so that it is still.
This is the function that sends the controls of the car from the python script. It takes the steering angle and throttle value and uses sio to send it to the Udacity simulator.
Here is where the python script is set up. When running the script the format is python drive.py model_name.h5, the model is added through the parser command. The model is then loaded and the app is set to the right port that works with Udacity Simulator.
With all this code we managed to create a model that is able to drive around the lake track perfectly without crashing. This is achieved by gathering the data in a specific way, focusing on parts where the model struggles to drive on. Then, training the model using the right convolution layers and the large amount of images that was able to be produced using the batch_generator generator function. Finally the model is able to control the car using the drive.py file.
Conclusion
At the end of this project, we have achieved 2 different ways on how we can control a car. First using openCV and then using a deep learning model.
OpenCV turned out to be slightly trickier than we originally imagined as finding the best performance filters required a lot of experimenting. On top of that we ran into multiple issues with the image colors being warped after processing it through different functions and the lake track from Udacity also having it's fair share of tricky locations. The blurring effect was our strongest weapon against this varying detail on the track. This program also utilizes a rolling average to compensate for changes in track such as sharp turns or long stretches of straight road. All in all, it seems that our OpenCV program, after a lot of trial and error, is capable of driving around the track in one piece, depending on some dynamic variables within the game itself. It's greatest problem remains the sharp turn right after the bridge which it has a chance of passing, depending on its location during the bridge part. We theorize that it mostly has issues with this turn because we chose to focus on white and yellow lines, which are abundant around the track except for that single stretch where a dirt road rears its head. However, if it manages to pass this corner it seems to have relatively little trouble with finishing the lap.
The deep learning part of this project took a lot of work to complete. Even after having the right code to train the model, it still took some time to get a model that can drive around the lake track perfectly. It took a lot of training rounds to finally land on the proper strategy of capturing training data. Luckily we had access to a pretty powerful PC which allowed us to train the model multiple time, this finally resulted in a model that is able to drive around the track without crashing.
Overall we are very proud of the work that we've done. However, we are deeply saddened by the fact that we were not able to put all of our ambition to fruition. Given enough time we would have perfected the OpenCV program and we would have liked to try and apply our two methods to a game called Trackmania. But overall, we are still very happy with what we have and learned a lot from this project.