Infomap Data-Processing Examples

Process Infomap .ftree files into module communities and edge data per module community

import nttc

# 1. Retrieve directory of .ftree files and save each line of the file within a list of lists to per Period Dict
ftree_path = '../infomap/output/nets/ftree/ftree'

# regex is the file pattern in a dedicated directory, e.g., 
# # r"\d{1,2}" will match the '1' in p1_ftree.ftree
dict_map = nttc.batch_map(regex=r"\d{1,2}", path=ftree_path, file_type='ftree')

# Print sample ftree modules
print(
    '1.\nIndices: ',
    dict_map['1']['indices']['ftree_modules'],
    '\n\nFirst 5 file lines of module section: ',
    dict_map['1']['lines'][dict_map['1']['indices']['ftree_modules'][0]:5],
    '\n\n'
)

# Print sample ftree links
five = dict_map['1']['indices']['ftree_links']['1']['indices'][0]+5
print(
    '2.\nIndices for module 1 links: ',
    dict_map['1']['indices']['ftree_links']['1']['indices'],
    '\n\nFirst 5 lines of period 1, module 1 links section: ',
    dict_map['1']['lines'][dict_map['1']['indices']['ftree_links']['1']['indices'][0]:five],
    '\n\n'
)

Output:

1.
Indices:  [2, 50521] 

First 5 file lines of module section:  ['1:1 0.156246 "username1" 4', '1:2 0.138213 "username2" 294', '1:3 0.00534793 "username3" 533'] 


2.
Indices for module 1 links:  [50525, 95864] 

First 5 lines of period 1, module 1 links section:  ['2 1 0.00383033', '5 1 0.00319596', '1359 1 0.00299684', '1359 2 0.00298003', '28 1 0.0025742']

# Check output
dict_map['1']['indices']['ftree_links']['1']

Output:

{'exit_flow': '0.0',
 'indices': [50525, 95864],
 'num_children': '7362',
 'num_edges': '45339'}

copy_dict_map = dict_map
# Process each period's module edge data and stores as a DataFrame.
dict_with_edges = nttc.ftree_edge_maker(copy_dict_map)

Output for 10 periods:

Processing edge data for period 1
Processing edge data for period 2
Processing edge data for period 3
Processing edge data for period 4
Processing edge data for period 5
Processing edge data for period 6
Processing edge data for period 7
Processing edge data for period 8
Processing edge data for period 9
Processing edge data for period 10
Processing complete!

# Check sample of dataframe output
dict_with_edges['2']['indices']['ftree_links']['2']['df_edges'][:10]

Output (dataframe):

index source    target  directed_count
0   2   1   0.0146604
1   7   1   0.0069932
2   192 1   0.00081632
3   639 1   0.000395405
4   109 1   0.000299742
5   3   1   0.000294408
6   4   1   0.000266507
7   261 1   0.00022959
8   525 1   0.000165815
9   747 1   0.000146682

# Take full listified .ftree file and write per Period per Module hubs as a Dict
new_dict = dict_with_edges
dh = nttc.infomap_hub_maker(new_dict, file_type='ftree', mod_sample_size=10, hub_sample_size=-1)
print(
    '2.\nSample hub: ',
    dh['1']['info_hub']['1'][:5]
)

Output:

2.
Sample hub:  [{'node': '1', 'name': 'username1', 'score': 0.156246}, {'node': '2', 'name': 'username2', 'score': 0.138213}, {'node': '3', 'name': 'username3', 'score': 0.00534793}, {'node': '4', 'name': 'username4', 'score': 5.96884e-05}, {'node': '5', 'name': 'username5', 'score': 5.59752e-05}]

# Write edge and node lists per module: 
## (num of periods, num of modules, Dict of module data from infomap_hub_maker)
dict_full = nttc.networks_controller(10,10,dh)

Output as DataFrame:

directed_count  source  source_name target  target_name
0   0.00808964  8   username2   1   username1
1   0.00648447  11  username4   1   username3
2   0.00527613  6   username6   1   username1
3   0.00361715  18  username8   1   username1
4   0.00356268  4   username10  1   username4

Tally infomap scores per hub

dhn = dh
totals_dhn = nttc.score_summer(dhn, hub_sample_size=50)
# Updated hubs with scores
totals_dhn['1']['info_hub']['1'][:2]

Output:

[{'name': 'username1',
  'node': '4',
  'score': 0.156246,
  'total_hub_flow_score': 0.30013800000000007,
  'total_period_flow_score': 0.5393100000000002},
 {'name': 'username2',
  'node': '294',
  'score': 0.138213,
  'total_hub_flow_score': 0.30013800000000007,
  'total_period_flow_score': 0.5393100000000002}]

# Example process to append period and community module labels
tdhn = totals_dhn
for p in tdhn:
    for h in tdhn[p]['info_hub']:
        top_name = tdhn[p]['info_hub'][h][0]['name']
        for n in tdhn[p]['info_hub'][h]:
            n.update({'period': p})
            n.update({'community': h})

tdhn['1']['info_hub']['1'][:2]

Output:

[{'community': '1',
  'name': 'username1',
  'node': '4',
  'period': '1',
  'score': 0.156246},
 {'community': '1',
  'name': 'username2',
  'node': '294',
  'period': '1',
  'score': 0.138213}]