Project overview
Networks/graphs are a valuable tool for modeling social data. Typically, nodes model individual entities and edges models related pairs of entities. Given such a graph, a common task to visualize the structure of the relationships between the various entities. Two important tasks in this space are:
- Graph layout(Links to an external site.) – Assigning geometric positions to nodes of graph in preparation for drawing the graph with nodes visualized as circles and edges visualized as line segments. “Good” graph layouts visually expose the structure of the graph.
- Community detection(Links to an external site.) – Partitioning the nodes of the graph into communities where each community is modeled by a set of nodes that densely connected internally. For social networks, communities correspond to sub-groups of entities that highly inter-related.
For this project, we will use visualization to assess the effectiveness of various method (both built-in and custom-coded) for these problems. Part 1 will focus on graph layout and part 2 will focus on community detection.
Clicking the “My Work” button at the bottom of Vocareum’s instruction page opens a Jupyter notebook server that displays the work directory for this week’s project. This directory contains two files and a sub-directory:
- ipynb- Jupyter notebook that contains the template for this week’s project.
- py- Python file that contains testing code specified to this week’s project.
- data- Directory that contains the files pickle and facebook_combined.txt.
Starting from project7_template.ipynb, your task is to follow the instructions in the project description below and create a solution notebook project7_solution.ipynb. This notebook will be machine-graded by our testing code when you hit “Submit” and then peer-graded by your classmates after the submission deadline has passed.
Before you begin the project, we suggest that you review the provided code in the template notebook. The code includes several functions for computing a random layout of a graph and plotting the result. Note that the function plot_graph() draws a graph using the draw_networkx() method from the networkx module. plot_graph() includes several optional arguments that are useful when implementing the project.
Project description – part 1
For part one of the project, your job is to compare networkx’s built-in method for graph layout with a custom-coded layout method that you will implement. In networkx, a graph’s layout is specified as a dictionary whose keys are nodes and values are positions (specified as a pair of numbers). networkx has a built-in method spring_layout() that computes geometric layouts for the graph by using force-directed graph drawing (Links to an external site.) algorithm.
Your main task is to implement a machine-graded graph layout method that positions nodes such that the geometric distances between pairs of connected nodes approximate the lengths of the shortest paths between these nodes. This method does the following:
- Computes the lengths of all pairs of shortest paths for a given graph using the all_pairs_shortest_path()method from networkx,
- Defines an error function that computes that the square of the difference between the lengths of these paths and the distance between the corresponding endpoints of these paths for a given layout of the graph,
- Minimizes this error function using optimizeas done in this week’s practice project.
Finally, you will implement a peer-graded graph plotting function that creates side-by-side plots of spring-based versus distance-based graph layouts and compare the plots produced by our testing code in Question 1.
Implement the machine-graded function get_node_indices() (7 points)
Your first task is to implement the function get_node_indices(grph) that takes a graph grph and returns a dictionary whose keys are the node labels and whose values are integer indices from 00 to n−1n−1 where nn is the number of nodes in the graph. A node’s index corresponds to its position in grph.nodes.
Implement the machine-graded function distance_error() (7 points)
Your second task is to implement the function distance_error(flat_node_pos, path_lengths) that takes a 1D numpy array flat_node_pos of the form [x0 y0 x1 y1 …] and a nested dictionary path_lengths with the nested keys as node indices of the start and end of the path and the values as the length of the corresponding path from the start node to the end node (path_lengths will be computed in the next function distance_layout through all_pairs_shortest_path_length() method). The function should return the sum of the squares of differences between the path lengths in path_lengths and the geometric distances between nodes in flat_node_pos.
Implement the machine-graded function distance_layout() (10 points)
Your third task is to implement the function distance_layout(grph, seed) that takes a graph grph and an optional integer seed. The function returns a dictionary of node positions keyed by nodes that is computed by minimizing the quadratic function distance_error(flat_node_pos, shortest_paths) where shortest_paths is a fixed dictionary computed from grph.
shortest_paths should be the nested dictionary whose keys are node indices and whose values are the lengths of the shortest path lengths corresponding to pairs of these node indices. This dictionary can be computed in 2 lines of Python by applying the all_pairs_shortest_path_length() method to a copy of grph whose nodes have been relabeled using the dictionary returned by get_node_indices().
If a seed is provided, the function initializes the random number generator with this seed to ensure the resulting function is machine-gradable. In particular, the initial positions for the nodes in the optimization code should be generated via nn calls to random_node_pos() where nn is the number of nodes in grph.
Implement the peer-graded function plot_spring_vs_distance() (16 points)
Your final task is to implement the function plot_spring_vs_distance(grph, with_labels, node_colors, seed) that takes as input a graph grph and optional arguments with_labels, node_colors, and seed. This function returns a matplotlib figure consisting of two side-by-side plots comparing the spring layout and the distance layout for grph. Use the provided seed for both spring layout and distance layout. The plots should include node labels and node colors if specified. You can leverage the provided function plot_graph() to plot these graphs.
Each individual plot in the side-by-side comparison should include an appropriate title that includes the name of the graph and distinguishes spring vs distance.
Answer optional peer-graded question 1 (0 points)
Based on these examples, what are the advantages/disadvantages of the two layout methods?
Project description – part 2
In the second part of this project, we will assess the quality of the communities computed by the python-louvain package . This package implements a best_partition() method that automatically determines the number of communities for a graph and then partitions the graph into these communities. Our assessment will evaluate this method on simple graphs that have a pre-determined community structure and Facebook ego network data.
Specifically, your task is to implement three functions in Python and then answer a final question. The first machine-graded function uses the python-louvain module to compute communities in a graph. The second and third peer-graded function plot the community structure of two types of graphs. You will then review the resulting plots and discuss its effectiveness in Question 2.
Implement the machine-graded function get_communities() (6 points)
Your first task is to implement the function get_communities(grph, seed) that takes a graph grph as input and returns a list of integers indicating the community for each corresponding node. The function should use the best_partition() method (Links to an external site.) from the python-louvain module (imported as community) and initialize the parameter random_state to use the provided seed.
Implement the peer-graded function plot_caveman_communities() (10 points)
Your second task is to implement the function plot_caveman_communities(num_cliques, clique_size, prob, seed) that takes as input integers num_cliques, clique_size and optional arguments prob (float), seed (int). The function should
- Compute relaxed caveman community graph using the relaxed_caveman_graph()method (Links to an external site.) from networkx with the provided seed,
- Compute communities using get_communities()using the provided seed,
- Return a matplotlibfigure consisting of a plot of the resulting communities using a spring layout with the provided seed. You can leverage the provided function plot_graph() to plot these graphs. The node in the graph should be numbered. Nodes in the same community should share a common color using the node_color option in plot_graph(). The plot should include an appropriate title.
Note that you may need to review this page on cliques (Links to an external site.) from graph theory when attempting to understand the structure of relaxed caveman graphs.
Implement the peer-graded function plot_facebook_communities() (4 points)
Your final coding task is to implement the function plot_facebook_communities(seed) that computes and plots communities for a Facebook ego network as described here (Links to an external site.). The function should
- Read the Facebook ego network data in txt(in the data directory) using networkx,
- Compute communities in the resulting graph using get_communities()using the provided seed,
- Return a matplotlibfigure consisting of a plot of the computed communities using a spring layout with the provided seed. You can leverage the provided function plot_graph() to plot these graphs. The communities should be visualized using the node_color option in plot_graph(). The plot should include an appropriate title.
- Instead of plot_graph(), you should use draw_networkx()directly to produce a higher quality plot. Specifically, the nodes in the plot should not include labels and should have a node size of 35 to avoid excessive overlap of the nodes.
Answer optional peer-graded question 2 (0 points)
Based on these examples, how well does the get_community() function detect communities in a graph?
Machine-grading
Each graded project has a total score of 70 points. The machine graded portion of the project counts for 40 points with the following breakdown:
- 30 points – correctness as determined by pyon the test in project7_tests.pickle.
- 10 points – Pylint’s automated assessment of how well your code follows the coding guidelinesfor the class.
Note that some of required machine-graded functions include companion test functions of these form test_machine_graded_functions(). These companion functions are included to help generate tests cases for project7_tests.pickle. You are encourage to add (or create more) tests for your project as you test it. (Remember to tag any new cells that contain these extra tests as “notebook_only” so machine grader ignores these cells.)
The provided file project7_checker.py includes a copy of machine-grading code specific to this project. You are welcome to examine (and even modify) this code as your implement/test your project. Remember that the “Build” command runs your version of project7_checker.py while the “Submit” command (which records a grade) runs our original hidden copy of project7_checker.py.
Peer-grading
The peer-graded component of each project accounts for the remaining 30 points (out of the 70 points total). Each required peer-graded plotting functions includes a companion test function of the form test_peer_grade_function(). This companion functions makes required test calls to your plotting function. Be sure to run these companion functions to produce the required plots. Save the resulting output (including the plots) in your submitted Jupyter notebook.
Rubrics for peer-graded plotting functions are usually of the follow form:
- Formatting – (30%) – do the included plots follow the plotting guidelines?
- Correctness – (60%) – are the included plots corrects?
- Reproducibility – (10%) – do the plots computed by running the notebook match the included plots?
tips project 7 Project 7 overview
APA
CLICK HERE FOR FURTHER ASSISTANCE ON THIS ASSIGNMENT
The post Networks/graphs are a valuable tool for modeling social data. Typically, nodes model individual entities and edges models related pairs of entities. Given such a graph, a common task to visualize the structure of the relationships between the various entities. Two important tasks in this space are: Graph layout(Links to an external site.) – Assigning geometric positions to nodes of graph in preparation for drawing the graph with nodes visualized as circles and edges visualized as line segments. “Good” graph layouts visually expose the structure of the graph. Community detection(Links to an external site.) – Partitioning the nodes of the graph into communities where each community is modeled by a set of nodes that densely connected internally. For social networks, communities correspond to sub-groups of entities that highly inter-related. appeared first on Apax Researchers.