I have a graph that I created from a pandas data frame. The length of the graph is ~450k edges. When I try to run the
weighted_projected_graph function, it runs for a long time (I have not seen it finish), presumably because of the size of this data set. What is a good method for reducing the size of this data set before creating the bipartite graph?
I have tried narrowing it down by using the most connected components:
trim1 = len([c for c in net.connected_component_subgraphs(g) if len(c) > 10]) C = net.Graph() gg = sorted(net.connected_component_subgraphs(g), key=len, reverse=True)[trim1]
but I don't think this is giving me the results I want and, further, I'm not confident this is an analytically sound strategy. Does anybody have any other recommendations for reducing the size of this set?
EDIT: full code, without reducing. What I would be trying to do with the above would swap out
g below for
gg from above in the call to
reviews = pd.read_csv(r"\BX-Book-Ratings.csv", sep=";", encoding = "ISO-8859-1") users = reviews['User-ID'].values.tolist() books = reviews['ISBN'].values.tolist() g=net.from_pandas_edgelist(reviews,'User-ID','ISBN',['Book-Rating']) print(len(g)) >>> 445839 p = bi.projected_graph(g, users) ```