1

I have a graph that I created from a pandas data frame. The length of the graph is ~450k edges. When I try to run the `weighted_projected_graph`

function, it runs for a long time (I have not seen it finish), presumably because of the size of this data set. What is a good method for reducing the size of this data set before creating the bipartite graph?

I have tried narrowing it down by using the most connected components:

```
trim1 = len([c for c in net.connected_component_subgraphs(g) if len(c) > 10])
C = net.Graph()
gg = sorted(net.connected_component_subgraphs(g), key=len, reverse=True)[trim1]
```

but I don't think this is giving me the results I want and, further, I'm not confident this is an analytically sound strategy. Does anybody have any other recommendations for reducing the size of this set?

EDIT: full code, without reducing. What I would be trying to do with the above would swap out `g`

below for `gg`

from above in the call to `bi.projected_graph`

```
reviews = pd.read_csv(r"\BX-Book-Ratings.csv", sep=";", encoding = "ISO-8859-1")
users = reviews['User-ID'].values.tolist()
books = reviews['ISBN'].values.tolist()
g=net.from_pandas_edgelist(reviews,'User-ID','ISBN',['Book-Rating'])
print(len(g))
>>> 445839
p = bi.projected_graph(g, users)
```
```

How are you constructing the graph from the frame?

`weighted_projected_graph`

takes a bipartite graphs as input (but doesn't check whether the input really is bipartite), and generally will not produce a bipartite output. The result should be the same as running the function on each component separately and taking the union, but since the implementation looks at second neighborhoods to get the counts, that shouldn't actually be much faster(?). – Ben Reiniger – 2020-10-12T14:50:16.343@BenReiniger thanks for that clarification. However, now using

`projected_graph`

to create the bipartite graph, that is timing out. I edited the post to include my entire code. – Sam cd – 2020-10-13T14:21:59.823