Question 4a)

Question 4b)

Question 4b)1,2,3)

import pandas as pd df1 = pd.read_csv('/work/case1-perfmon-data.csv') # Current df2 = pd.read_csv('/work/case2-perfmon-data.csv') # Big ########################## 4(a) ############################################# time11 = df1['Time'].iloc[0] time12 = df1['Time'].iloc[-1] time21 = df2['Time'].iloc[0] time22 = df2['Time'].iloc[-1] elapsed = time12-time11; elapsed2 = time22-time21; print("Duration of performance test 1: ", elapsed) print("Duration of performance test 2: ", elapsed2) ########################## 4(b) ############################################# rows = len(df1.index) rows2 = len(df2.index) # print("database 1 rows: ",rows) # print("database 2 rows: ",rows2) frequency = elapsed/rows frequency2 = elapsed2/rows2 # frequency = (rows)/elapsed # frequency2 = (rows2)/elapsed2 print("Frequency of data 1: ",frequency) print("Frequency of data 2: ",frequency2) ########################## 4(c) ############################################# DB1Disk = df1["\PhysicalDisk(_Total)\% Disk Time"].mean() DB1CPU = df1["\Processor(0)\% Processor Time"].mean() temps1 = r"\\isp-01\PhysicalDisk(_Total)\% Disk Time" WebApp1Disk = df1[temps1].mean() temps2 = r"\\isp-01\Processor(0)\% Processor Time" temps3 = r"\\isp-01\Processor(1)\% Processor Time" tempList1 = [] tempList1 = df1[temps2].tolist() tempList1.extend(df1[temps3].tolist()) tempPD = pd.DataFrame(tempList1, columns = ['% Processor Time']) WebApp1CPU = tempPD['% Processor Time'].mean() print("Data 1:") print(" DB Disk: ", DB1Disk,"%", ", DB CPU: ", DB1CPU,"%") print(" WebApp Disk: ", WebApp1Disk,"%", ", Web App CPU: ", WebApp1CPU,"%") DB2Disk = df2["\PhysicalDisk(_Total)\% Disk Time"].mean() DB2CPU = df2["\Processor(0)\% Processor Time"].mean() WebApp2Disk = df2[temps1].mean() tempList2 = [] tempList2 = df2[temps2].tolist() tempList2.extend(df2[temps3].tolist()) tempPD = pd.DataFrame(tempList2, columns = ['% Processor Time']) WebApp2CPU = tempPD['% Processor Time'].mean() print("Data 2:") print(" DB Disk: ", DB2Disk,"%", ", DB CPU: ", DB2CPU,"%") print(" WebApp Disk: ",WebApp2Disk,"%",", Web App CPU: ", WebApp2CPU,"%") ########################## 4(d) #############################################

Question 4 b) - 4) Database Management Server and Web Server

#For Database Management system and Web server process, since there is no "process" column we used User time as a way to measure DBM1CPU = df1["\Process(db2syscs)\% User Time"].mean() + df1["\Process(db2syscs)\% Privileged Time"].mean() web_server_utilization = df1[r'\\isp-01\Process(wHTTPg)\% Privileged Time'].mean() + df1[r'\\isp-01\Process(wHTTPg)\% User Time'].mean() all_app_server_utilizations = 0 for i in range(0,16): all_app_server_utilizations += (df1[str(rf'\\isp-01\Process(server#{i})\% User Time')].mean() + df1[str(rf'\\isp-01\Process(server#{i})\% Privileged Time')].mean()) all_app_server_utilizations /= 16 print("Current DB:") print(f"CPU utilization of web server processes: {web_server_utilization}%") print(f"CPU utilization of app server processes: {all_app_server_utilizations}%") print(f"CPU utilization of database processes: {DBM1CPU}%") print() DBM1CPU = df2["\Process(db2syscs)\% User Time"].mean() + df2["\Process(db2syscs)\% Privileged Time"].mean() web_server_utilization = df2[r'\\isp-01\Process(wHTTPg)\% Privileged Time'].mean() + df2[r'\\isp-01\Process(wHTTPg)\% User Time'].mean() all_app_server_utilizations = 0 for i in range(0,16): all_app_server_utilizations += (df2[str(rf'\\isp-01\Process(server#{i})\% User Time')].mean() + df2[str(rf'\\isp-01\Process(server#{i})\% Privileged Time')].mean()) all_app_server_utilizations /= 16 print("Big DB:") print(f"CPU utilization of web server processes: {web_server_utilization}%") print(f"CPU utilization of app server processes: {all_app_server_utilizations}%") print(f"CPU utilization of database processes: {DBM1CPU}%") print()

Question 4b) - 4) Application Server Stuff

cpuList1 = [] cpuList2 = [] for i in range(16): s = r"\\isp-01\Process(server#{})\% User Time".format(i) cpuList1.extend(df1[s].tolist()) cpuList2.extend(df2[s].tolist()) cpudf1 = pd.DataFrame(cpuList1,columns = ['% User Time 1']) cpudf2 = pd.DataFrame(cpuList2,columns = ['% User Time 2']) print(cpudf1.mean()) print(cpudf2.mean())

Question 4c)

#### Question

Bookzilla test engineers have told you that there is very little virtual memory activity in their systems and that you need not worry about this factor during performance evaluation. Based on the perfmon data, do you agree with this assessment? Provide concrete reasons for your view.

#### Answer

import pandas as pd df1 = pd.read_csv('/work/case1-perfmon-data.csv') df2 = pd.read_csv('/work/case2-perfmon-data.csv') PF1 = df1["\Memory\Page Faults/sec"].mean(); PF2 = df1[r"\\isp-01\Memory\Page Faults/sec"].mean(); PI1 = df1["\Memory\Pages Input/sec"].mean(); PI2 = df1[r"\\isp-01\Memory\Pages Input/sec"].mean(); PO1 = df1["\Memory\Pages Output/sec"].mean(); PO2 = df1[r"\\isp-01\Memory\Pages Output/sec"].mean(); print("case 1:"); print("page faults/sec: " + str(PF1)); print("isp\page faults/sec: " + str(PF2)); print("page input/sec: " + str(PI1)); print("isp\page input/sec: " + str(PI2)); print("page output/sec: " + str(PO1)); print("isp\page output/sec: " + str(PO2)); PF1 = df2["\Memory\Page Faults/sec"].mean(); PF2 = df2[r"\\isp-01\Memory\Page Faults/sec"].mean(); PI1 = df2["\Memory\Pages Input/sec"].mean(); PI2 = df2[r"\\isp-01\Memory\Pages Input/sec"].mean(); PO1 = df2["\Memory\Pages Output/sec"].mean(); PO2 = df2[r"\\isp-01\Memory\Pages Output/sec"].mean(); print("\ncase 2:"); print("page faults/sec: " + str(PF1)); print("isp\page faults/sec: " + str(PF2)); print("page input/sec: " + str(PI1)); print("isp\page input/sec: " + str(PI2)); print("page output/sec: " + str(PO1)); print("isp\page output/sec: " + str(PO2));

For case 1, on the database service, there does not seem to be a lot of virtual memory activity because the average page faults per second are 22.8. However, on the web and application server, the average page faults per second are 796.5. Page faults are caused by virtual memory activity. A similar case can be seen in case 2, 175.5 on the database service and 742.97 on the web and application server.

For case 1, the page input/sec for both the database service and the web and application server are less than 1. For case 2, the page input/sec for the database service is over 1 but still not high and is also less than 1 for the web and application service. Based on the page inputs/s second being low we agree that there is not much virtual memory activity.

https://learn.microsoft.com/en-us/sql/relational-databases/performance-monitor/monitor-memory-usage?view=sql-server-ver16

https://www.poweradmin.com/blog/pages-per-second-counters/

Question 4d)

#### Question

Do you agree with the thread/process concurrency information provided by Bookzilla for the Web, application, and database servers? Provide a justification based on the perfmon data.

#### Answer

##### Application Server

Bookzilla mentioned that web server processes are assigned 1000 threads, however it is evident from the data that there are 1008 threads used for ips-01 as seen here

import pandas as pd df1 = pd.read_csv('/work/case1-perfmon-data.csv') PF1 = df1[str(r"\\isp-01\Process(wHTTPg)\Thread Count")].mean(); print(f"Thread count web server: {PF1}")

There is however additional threads beyond the ~1000, as one is assigned to each server instance such as for \\isp-01\Process(server#0)\Thread Count. In addition to the 8 threads assigned to 8 threads assigned to the srvctrl process on isp-01

import pandas as pd df1 = pd.read_csv('/work/case1-perfmon-data.csv') server01_threads = df1[str(r"\\isp-01\Process(server#0)\Thread Count")].mean(); isp01_srvctl_threads = df1[str(r"\\isp-01\Process(srvrctrl)\Thread Count")].mean(); print(f"isp01 server control process: {server01_threads}") print(f"isp01 server #1 threads (same for other server numbers): {isp01_srvctl_threads}")

Adjust to mention the 16 processes for the 16 threads section

##### Database Server

The database management system process is mentioned to have 33 concurrent threads, which is properly reflected within the perfmon data

import pandas as pd df1 = pd.read_csv('/work/case1-perfmon-data.csv') PF1 = df1[str(r"\Process(db2syscs)\Thread Count")].mean(); print(f"Thread count database: {PF1}")

Verdict

The thread count provided by BookZilla is failing to account for the true number of threads acting on the application/web tier. It is however accurate for the database tier.

Therefore, we do not agree with BookZillas statements about the thread count

Question 4e)

#### Question:

You will observe a slight discrepancy between what you computed in 4.b.4 and 4.b.3. For example, although the database management system process was the only process using the DB machine, its CPU utilization (computed in 4.b.4) is less than that of the CPU utilization of the DB machine computed in 4.b.3. Provide possible explanations for such mismatches.

#### Answer:

It could be losing processer usage time due to the transition between the actual machine system and the DB management system process.

Another reason could be the database machine required more utilization to create/call the DB management system process.

Due to multithreads being present, the DB Machine may be double counting CPU utilization thus it is measurements is higher than the actual process

The database management system process may be waiting for I/O operations to complete, such as reading or writing to disk, which reduces its CPU utilization.\.

Question 4f)

Question 5)

Question a)

Let us now focus on application-level metrics such as throughput and response time. Compute the following for both Current DB and Big DB:

1)

Question:

The per-request mean response time is the sum of the time to establish a connection with the server, wait till the first byte of the response, and ultimately obtain the last byte of the response.

Answer:

Mean Response Time = Time To Open Connection + First Byte Time + Last Byte Time

hp_df1 = pd.read_csv('/work/case1-httperf-detailed-output.csv') hp_df2 = pd.read_csv('/work/case2-httperf-detailed-output.csv') hp_df1[hp_df1 < 0] = 0 hp_df1['Response Time'] = hp_df1.iloc[:,2:5].sum(axis=1) mean_response_time_1 = hp_df1['Response Time'].mean()/1000 hp_df2[hp_df2 < 0] = 0 hp_df2['Response Time'] = hp_df2.iloc[:,2:5].sum(axis=1) mean_response_time_2 = hp_df2['Response Time'].mean()/1000 hp_df1.head() print("Case 1 Mean Reponse Time: %0.4f s\nCase 2 Mean Reponse Time: %0.4f s" % (mean_response_time_1, mean_response_time_2))

2)

Question:

The throughput in request completions/second.

Answer:

Throughput = Total Replies / Test Duration

Case 1: 89592 replies / 11136.199 s = 8.04511485472

Case 2: 90399 / 12110.736 = 7.46436880467

3)

Question:

The mean think time between successive requests from a customer

Answer:

From D2L: (Mean Connection Time - (Mean Replies * Mean Response Time)) / Mean Replies

Case 1: (368.5145 - (9.248 * 1.4906)) / 9.248 = 38.357s

Case 2: (375.8772 - (9.253 * 2.0198)) / 9.253 = 38.602s

4)

Question:

The mean number of concurrent customer sessions in the system. (Hint: You need to use Little’s law for this)

Answer:

From D2L: Average Sessions = Throughput * (Mean Response Time + Think Time)

Case 1: 8.04511485472 * (1.4906 + 38.357) = 320.578 Sessions

Case 2: 7.46436880467 * (2.0198 + 38.602) = 303.216 Sessions

Question b)

Question:

Bookzilla’s test engineers have told you that the network was lightly utilized and that it can be ignored as a factor in your study. Is there any data available to back up this claim?

Answer:

The Net/IO field in the case 1 summary is registering 54.1 KB/s which is barely any network traffic compared with the maximum 100Mbps of their connection. Same for case 2 where the Net I/O is 51.3 KB/s. Hence it seems fair to ignore the network utilization as a factor in this calculation.

Question c)

Question:

From the analysis in a), discuss the implications of supporting a larger catalog of books on the experience of an end-user of Bookzilla.

Answer: Based on part a) we can see that supporting a larger book catalogue as simulated with the case 2 data mean that the average response time increases substantially with a 36% increase over the smaller database. Throughput decreases from roughly 8 replies per second to 7.4 replies per second and the number of concurrent sessions dropped from approximately 321 sessions to 303. This also had a very small effect on the average think time but it was a very small relative increase. Overall for the effects on an end-user the most important metric here is the mean response time which saw a substantial increase with the larger database, however it is still a relatively low wait time of around 2 seconds so it's impact should only be felt for extremely large requests.

Question 6

a) 1, 2, 3, 4

Apply the utilization law to compute the mean demands placed by request on the following resources

import pandas as pd df1 = pd.read_csv('/work/case1-perfmon-data.csv') # Current df2 = pd.read_csv('/work/case2-perfmon-data.csv') # Big time11 = df1['Time'].iloc[0] time12 = df1['Time'].iloc[-1] time21 = df2['Time'].iloc[0] time22 = df2['Time'].iloc[-1] elapsed = time12-time11; elapsed2 = time22-time21; rows = len(df1.index) rows2 = len(df2.index) frequency = elapsed/rows frequency2 = elapsed2/rows2 DB1Disk = df1["\PhysicalDisk(_Total)\% Disk Time"].mean() DB1CPU = df1["\Processor(0)\% Processor Time"].mean() temps1 = r"\\isp-01\PhysicalDisk(_Total)\% Disk Time" WebApp1Disk = df1[temps1].mean() temps2 = r"\\isp-01\Processor(0)\% Processor Time" temps3 = r"\\isp-01\Processor(1)\% Processor Time" tempList1 = [] tempList1 = df1[temps2].tolist() tempList1.extend(df1[temps3].tolist()) tempPD = pd.DataFrame(tempList1, columns = ['% Processor Time']) WebApp1CPU = tempPD['% Processor Time'].mean() print("The values for Data 1: ", DB1Disk, DB1CPU, WebApp1Disk, WebApp1CPU) DB2Disk = df2["\PhysicalDisk(_Total)\% Disk Time"].mean() DB2CPU = df2["\Processor(0)\% Processor Time"].mean() WebApp2Disk = df2[temps1].mean() tempList2 = [] tempList2 = df2[temps2].tolist() tempList2.extend(df2[temps3].tolist()) tempPD = pd.DataFrame(tempList2, columns = ['% Processor Time']) WebApp2CPU = tempPD['% Processor Time'].mean() print("The values for Data 2: ", DB2Disk, DB2CPU, WebApp2Disk, WebApp2CPU) print("Now calculating Demand for each resource based Law and Analysis:") print("Utilization(U) = Throughput(X) * Demand(D)") WebApp1CPUDemand = WebApp1CPU * 0.01 / 8.04511485472 DB1CPUDemand = DB1CPU * 0.01 / 8.04511485472 WebApp1DiskDemand = WebApp1Disk * 0.01/ 8.04511485472 DB1DiskDemand = DB1Disk * 0.01/ 8.04511485472 print("Case 1 demand results:") print(" WebApp CPU Demand: ", WebApp1CPUDemand,", DB CPU Demand: ", DB1CPUDemand) print(" WebApp Disk Demand: ", WebApp1DiskDemand,", DB Disk Demand: ", DB1DiskDemand) WebApp2CPUDemand = WebApp2CPU * 0.01/ 7.46436880467 DB2CPUDemand = DB2CPU * 0.01/ 7.46436880467 WebApp2DiskDemand = WebApp2Disk * 0.01/ 7.46436880467 DB2DiskDemand = DB2Disk * 0.01/ 7.46436880467 print("Case 2 demand results:") print(" WebApp CPU Demand: ", WebApp2CPUDemand,", DB CPU Demand: ", DB2CPUDemand) print(" WebApp Disk Demand: ", WebApp2DiskDemand, ", DB Disk Demand: ", DB2DiskDemand)

b)

1)

#Mean per-request demand placed by Web server #process on Web/App server machine’s CPUs # Resource Demand = Utilization of a particular resource / Throughput of Entire System #Throughput as calculated in 4.b.4 = 8.045 web_server_utilization = df1[r'\\isp-01\Process(wHTTPg)\% Privileged Time'].mean() + df1[r'\\isp-01\Process(wHTTPg)\% User Time'].mean() print(f'Current DB mean per-request demand for webserver is: {web_server_utilization/8.04511485472}') print() web_server_utilization = df2[r'\\isp-01\Process(wHTTPg)\% Privileged Time'].mean() + df2[r'\\isp-01\Process(wHTTPg)\% User Time'].mean() print(f'Big DB mean per-request demand for webserver is: {web_server_utilization/7.46436880467}')

2)

#Mean per-request demand placed by App server process (all 16 of them put together) #on Web/App server machine’s CPUs all_app_server_utilizations = 0 for i in range(0,16): all_app_server_utilizations += (df1[str(rf'\\isp-01\Process(server#{i})\% User Time')].mean() + df1[str(rf'\\isp-01\Process(server#{i})\% Privileged Time')].mean()) all_app_server_utilizations /= 16 print(f'Current DB mean per-request demand for app server is: {all_app_server_utilizations/8.04511485472}') print() all_app_server_utilizations = 0 for i in range(0,16): all_app_server_utilizations += (df2[str(rf'\\isp-01\Process(server#{i})\% User Time')].mean() + df2[str(rf'\\isp-01\Process(server#{i})\% Privileged Time')].mean()) all_app_server_utilizations /= 16 print(f'Big DB mean per-request demand for app server is: {all_app_server_utilizations/7.46436880467}')

3)

#Mean per-request demand placed by database #management system process on DB server machine’s CPU. DBM1CPU = df1["\Process(db2syscs)\% User Time"].mean() + df1["\Process(db2syscs)\% Privileged Time"].mean() print(f'Current DB mean per-request demand for app server is: {DBM1CPU/8.04511485472}') print() DBM1CPU = df2["\Process(db2syscs)\% User Time"].mean() + df2["\Process(db2syscs)\% Privileged Time"].mean() print(f'Big DB mean per-request demand for app server is: {DBM1CPU/7.46436880467}')

c)

Question: You will observe a slight mismatch between the total demand you calculated for a resource in 6.a and the sum of the demands placed on that resource by processes using that resource (6.b). Explain the reason for this mismatch.

d)

Question: Compare the resource demands you computed for the Current DB and Big DB scenarios. Discuss reasons for any significant differences that you observe. Discuss whether these demands provide us any insights on the kind of additional resources needed to satisfy the planned expansion of Bookzilla.

Demands for Case 2 seems to be higher than that of case 1, especially in the CPU demand. This is because of the for case 2 data, the size is based on the expansion plan (not for the current system they have), thus due to the huge and inappropriate size of the data for the current system, the CPU and disk is heavily utilized and stressed. U = X * D and U/X = D, so if U is much higher, Demand is higher as well.

To counteract the high U, Bookzilla needs to increase X, and to that, they can faster processers and potentially improve load balancing between the processors.

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Question 4a)

Question 4b)

Question 4b)1,2,3)

Question 4 b) - 4) Database Management Server and Web Server

Question 4b) - 4) Application Server Stuff

Question 4c)

Question 4d)

Verdict

Question 4e)

Question 4f)

Question 5)

Question a)

1)

2)

3)

4)

Question b)

Question c)

Question 6

a) 1, 2, 3, 4

b)

1)

2)

3)

c)

d)

Question 4a)