Social Network Analysis
import networkx as nx
import matplotlib.pyplot as plt
import pyvis as pv
import pandas as pd
def display_html(filename):
from IPython.core.display import display, HTML
with open(filename, 'r') as f:
display(HTML(f.read()))
Social Network Graph
For demonstrating the power of network x, I use small network x built-in graph called Les miserable s graph that returns co appearance network of characters in the novel Les Miserable s.
G = nx.les_miserables_graph()
Basic information
As you can see below:
This graph has 77 nodes with 254 edges
It's an redirection graph
all nodes are connected
print(nx.info(G))
print('directed' if nx.is_directed(G) else 'undirected')
print('connected' if nx.is_connected(G) else 'not connected')
Graph with 77 nodes and 254 edges
undirected
connected
Visualization
First I use simple network x draw method to visualize the graph with spring layout as below
pos = nx.spring_layout(G)
nx.draw(G, pos=pos, node_color='green', alpha=0.5, width=0.5)
Then for better interactive visualization i use pyviz package:
net = pv.network.Network()
net.from_nx(G)
net.show_buttons(filter_=['physics'])
net.show('graph.html')
display_html('graph.html')
Exploratory Analysis
Average Degree
nodes_degree = nx.degree(G)
display(nodes_degree)
degree_list = []
for (n,d) in nodes_degree:
degree_list.append(d)
av_degree = sum(degree_list) / len(degree_list)
print('The average degree is %s' % av_degree)
The average degree is 6.597402597402597
Degree Distribution
Plot the degree distribution to get a better insight
plt.hist(degree_list,label='Degree Distribution')
plt.axvline(av_degree,color='r',linestyle='dashed',label='Average Degree')
plt.legend()
plt.ylabel('Number of Nodes')
plt.title('Graph Node Degree')
Centrality
Degree Centrality
degree_centrality = nx.degree_centrality(G)
display(degree_centrality)
closeness_centrality = nx.closeness_centrality(G)
display(closeness_centrality)
node_betweenness_centrality = nx.betweenness_centrality(G)
display(node_betweenness_centrality)
edge_betweenness_centrality = nx.edge_betweenness_centrality(G)
display(edge_betweenness_centrality)
centralities = {
'node_betweenness_centrality': node_betweenness_centrality,
'closeness_centrality': closeness_centrality,
'degree_centrality': degree_centrality,
}
centralities_df = pd.DataFrame.from_dict(centralities)
display(centralities_df)
node_betweenness_centralityfloat64
0.0 - 0.5699890527836184
closeness_centralityfloat64
0.25675675675675674 - 0.6440677966101694
Napoleon
0
0.3015873016
Myriel
0.1768421053
0.4293785311
MlleBaptistine
0
0.4130434783
MmeMagloire
0
0.4130434783
CountessDeLo
0
0.3015873016
Geborand
0
0.3015873016
Champtercier
0
0.3015873016
Cravatte
0
0.3015873016
Count
0
0.3015873016
OldMan
0
0.3015873016
bridges = []
for bridge in nx.bridges(G):
bridges.append(bridge)
non_bridges = G.edges - bridges
print('we have %s bridges in the graph' % len(bridges))
display(bridges)
nx.draw_networkx_edges(G, pos, edgelist=bridges, style='solid', width=1.5, edge_color='red')
nx.draw_networkx_edges(G, pos, edgelist=non_bridges, style='solid', width=0.7, edge_color='black')
nx.draw_networkx_nodes(G, pos, node_size=100, alpha=0.2)
display('the red edges are the bridges')
we have 18 bridges in the graph
density = nx.density(G)
print('The edge density is %s' % density)
The edge density is 0.08680792891319207
triadic_closure = nx.transitivity(G)
print("Triadic closure:", triadic_closure)
Triadic closure: 0.49893162393162394
def display_communities(communities):
print("we found %s communities" % len(communities))
colors = ['red','green','blue','black','orange', 'yellow', 'purple']
counter = 0
for community in communities:
counter += 1
print("community_%s is:" % counter)
print(', '.join(community), '\n')
nx.draw_networkx_nodes(G, pos, nodelist=community, node_color=colors.pop(), alpha=0.5)
nx.draw_networkx_edges(G, pos, style='dashed', width=0.2)
communities = nx.algorithms.community.modularity_max.greedy_modularity_communities(G)
display_communities(communities)
modularity = nx.algorithms.community.quality.modularity(G, communities)
print("The modularity of this communities is: %s" % modularity)
partition_quality = nx.algorithms.community.quality.partition_quality(G, communities)
print("The coverage of this communities is: %s \nand the perfomance is: %s" % partition_quality)
we found 5 communities
community_1 is:
Valjean, Marguerite, Champmathieu, Fauchelevent, Myriel, Isabeau, Cravatte, Cochepaille, CountessDeLo, Labarre, Judge, Chenildieu, MmeDeR, MotherInnocent, Count, MlleBaptistine, MmeMagloire, Gervais, Brevet, OldMan, Geborand, Woman1, Napoleon, Champtercier, Scaufflaire, Gribier
community_2 is:
Child2, Courfeyrac, Combeferre, Gavroche, Bossuet, Prouvaire, Jondrette, Joly, Child1, Feuilly, MmeBurgon, MotherPlutarch, Grantaire, Mabeuf, Enjolras, Bahorel, MmeHucheloup
community_3 is:
Javert, Anzelma, Boulatruelle, Perpetue, Fantine, Babet, Eponine, Bamatabois, Montparnasse, Brujon, MmeThenardier, Thenardier, Simplice, Gueulemer, Claquesous
community_4 is:
Marius, BaronessT, Cosette, MlleGillenormand, LtGillenormand, Woman2, MlleVaubois, Pontmercy, MmePontmercy, Toussaint, Gillenormand, Tholomyes, Magnon
community_5 is:
Favourite, Dahlia, Fameuil, Blacheville, Listolier, Zephine
The modularity of this communities is: 0.4729424449732302
The coverage of this communities is: 0.7322834645669292
and the perfomance is: 0.815105946684894
k_clique_communities = []
for community in nx.algorithms.community.kclique.k_clique_communities(G, 5):
k_clique_communities.append(community)
display_communities(k_clique_communities)
we found 5 communities
community_1 is:
Marius, Courfeyrac, Combeferre, Valjean, Bossuet, Gavroche, Prouvaire, Joly, Feuilly, Grantaire, Enjolras, Mabeuf, Bahorel, MmeHucheloup
community_2 is:
Judge, Chenildieu, Valjean, Brevet, Champmathieu, Bamatabois, Cochepaille
community_3 is:
Valjean, Gavroche, Gueulemer, Javert, Cosette, Fantine, Babet, Eponine, Montparnasse, Brujon, Thenardier, MmeThenardier, Claquesous
community_4 is:
Marius, Cosette, Valjean, LtGillenormand, MlleGillenormand, Gillenormand
community_5 is:
Favourite, Blacheville, Fantine, Listolier, Dahlia, Fameuil, Tholomyes, Zephine
fluid_communities = []
for community in nx.algorithms.community.asyn_fluid.asyn_fluidc(G, 6):
fluid_communities.append(community)
display_communities(fluid_communities)
partition_quality = nx.algorithms.community.quality.partition_quality(G, fluid_communities)
print("The coverage of this communities is: %s \nand the perfomance is: %s" % partition_quality)
we found 6 communities
community_1 is:
BaronessT, Cosette, MlleGillenormand, LtGillenormand, Woman2, MlleVaubois, Pontmercy, MmePontmercy, Toussaint, Gillenormand, Magnon
community_2 is:
Marius, Courfeyrac, Combeferre, Feuilly, Gavroche, Bossuet, Prouvaire, Grantaire, Mabeuf, Enjolras, MmeHucheloup, Bahorel, Joly
community_3 is:
MlleBaptistine, MmeMagloire, OldMan, Geborand, Cravatte, Napoleon, CountessDeLo, Champtercier, Myriel, Count
community_4 is:
Boulatruelle, Blacheville, Dahlia, Javert, Thenardier, Zephine, Favourite, Perpetue, Gueulemer, Anzelma, Fantine, Babet, Eponine, Listolier, Montparnasse, Brujon, MmeThenardier, Simplice, Fameuil, Tholomyes, Claquesous
community_5 is:
MotherInnocent, Valjean, Marguerite, Gervais, Brevet, Champmathieu, Fauchelevent, Woman1, Bamatabois, Isabeau, Cochepaille, Labarre, Judge, Chenildieu, MmeDeR, Scaufflaire, Gribier
community_6 is:
Child2, Child1, MotherPlutarch, MmeBurgon, Jondrette
The coverage of this communities is: 0.7755905511811023
and the perfomance is: 0.8653451811346549