BikeDNA(八)外在分析:OSM 与参考数据的比较2
1.数据完整性
见链接
2.网络拓扑结构
见链接
3.网络组件
本节仔细研究两个数据集的网络组件特征。
断开连接的组件不共享任何元素(节点/边)。 换句话说,不存在可以从一个断开连接的组件通向另一组件的网络路径。 如上所述,大多数现实世界的自行车基础设施网络确实由许多断开连接的组件组成(Natera Orozco et al., 2020) 。 然而,当两个断开的组件彼此非常接近时,这可能是边缘缺失或另一个数字化错误的迹象。
方法
为了比较 OSM 和参考数据中断开组件的数量和模式,将内在分析的所有组件结果并置,并生成两个新图,分别显示 OSM 和参考数据的组件间隙以及组件连接性的差异。
解释
许多自行车网络的分散性使得很难评估断开的组件是否是由于缺乏数据质量或缺乏正确连接的自行车基础设施而导致的问题。 比较两个数据集中的断开组件可以更准确地评估断开组件是数据问题还是规划问题。
3.1 断开的组件
print(f"The OSM network in the study area consists of {osm_intrinsic_results['component_analysis']['component_count']} disconnected components."
)
print(f"The {reference_name} network in the study area consists of {ref_intrinsic_results['component_analysis']['component_count']} disconnected components."
)
The OSM network in the study area consists of 356 disconnected components.
The GeoDanmark network in the study area consists of 204 disconnected components.
plot_func.plot_saved_maps([osm_results_static_maps_fp + "all_components_osm",ref_results_static_maps_fp + "all_components_reference",]
)
3.2 组件长度分布
所有网络组件长度的分布可以在所谓的 Zipf 图 中可视化,该图按等级对每个组件的长度进行排序,在左侧显示最大组件的长度,然后是第二大组件的长度,依此类推,直到 右侧最小组件的长度。 当 Zipf 图遵循 双对数比例 中的直线时,这意味着找到小的不连续组件的机会比传统分布的预期要高得多 (Clauset et al., 2009)。 这可能意味着网络没有合并,只有分段或随机添加 (Szell et al., 2022),或者数据本身存在许多间隙和拓扑错误,导致小的断开组件。
但是,也可能发生最大的连通分量(图中最左边的标记,等级为 1 0 0 10^0 100)是明显的异常值,而图的其余部分则遵循不同的形状。 这可能意味着在基础设施层面,大部分基础设施已连接到一个大型组件,并且数据反映了这一点 - 即数据在很大程度上没有受到间隙和缺失链接的影响。 自行车网络也可能介于两者之间,有几个大型组件作为异常值。
在对同一区域进行比较时,如下所示,如果一个数据集在其最大连通分量中显示出明显的异常值,而另一个数据集则没有,并且如果它也至少同样大,则通常可以解释为 更加完整。
plot_func.plot_saved_maps([osm_results_plots_fp + "component_length_distribution_osm",ref_results_plots_fp + "component_length_distribution_reference",],figsize=pdict["fsmap"]
)
3.3 最大连通分量
# Read largest cc
osm_largest_cc = gpd.read_file(osm_results_data_fp + "largest_connected_component.gpkg")
ref_largest_cc = gpd.read_file(ref_results_data_fp + "largest_connected_component.gpkg")print(f"The largest connected component in the OSM network contains {osm_intrinsic_results['component_analysis']['largest_cc_pct_size']:.2f}% of the network length."
)
print(f"The largest connected component in the {reference_name} network contains {ref_intrinsic_results['component_analysis']['largest_cc_pct_size']:.2f}% of the network length."
)
The largest connected component in the OSM network contains 91.47% of the network length.
The largest connected component in the GeoDanmark network contains 80.04% of the network length.
plot_func.plot_saved_maps([osm_results_static_maps_fp + "largest_conn_comp_osm",ref_results_static_maps_fp + "largest_conn_comp_reference",]
)
OSM 和参考网络中最大连接组件的叠加
# Plotset_renderer(renderer_map)
fig, ax = plt.subplots(1, figsize=pdict["fsmap"])osm_largest_cc.plot(ax=ax, linewidth=3.5, color=pdict["osm_base"], label="OSM")
ref_largest_cc.plot(ax=ax, linewidth=1.25, color=pdict["ref_base"], label=reference_name)ax.set_title(f" {area_name}: largest connected components")
ax.set_axis_off()
ax.legend()
cx.add_basemap(ax=ax, crs=study_crs, source=cx_tile_2)plot_func.save_fig(fig, compare_results_static_maps_fp + "largest_cc_overlay_compare")
# Plot again for potential report titlepageset_renderer(renderer_map)
fig, ax = plt.subplots(1, figsize=pdict["fsmap"])osm_largest_cc.plot(ax=ax, linewidth=3, color=pdict["osm_base"], label="OSM")
ref_largest_cc.plot(ax=ax, linewidth=1, color=pdict["ref_base"], label=reference_name)
ax.set_axis_off()plot_func.save_fig(fig, compare_results_static_maps_fp + "titleimage",plot_res="high")
plt.close()
3.4 缺少链接
在组件之间潜在缺失链接的图中,将绘制与另一个组件上的边的指定距离内的所有边。 断开的边缘之间的间隙用标记突出显示。 因此,该地图突出显示了边缘,尽管这些边缘彼此非常接近,但它们是断开连接的,因此不可能在边缘之间的自行车基础设施上骑自行车。
# DEFINE MAX BUFFER DISTANCE BETWEEN COMPONENTS CONSIDERED A GAP/MISSING LINK
component_min_distance = 10assert isinstance(component_min_distance, int) or isinstance(component_min_distance, float
), print("Setting must be integer or float value!")
# Read results with component gapsosm_cg_edge_ids = pd.read_csv(osm_results_data_fp + f"component_gaps_edges_{component_min_distance}.csv"
)["edge_id"].to_list()
osm_component_gaps_edges = osm_edges_simplified.loc[osm_edges_simplified.edge_id.isin(osm_cg_edge_ids)
]ref_cg_edge_ids = pd.read_csv(ref_results_data_fp + f"component_gaps_edges_{component_min_distance}.csv"
)["edge_id"].to_list()
ref_component_gaps_edges = ref_edges_simplified.loc[ref_edges_simplified.edge_id.isin(ref_cg_edge_ids)
]osm_component_gaps = gpd.read_file(osm_results_data_fp + f"component_gaps_centroids_{component_min_distance}.gpkg"
)
ref_component_gaps = gpd.read_file(ref_results_data_fp + f"component_gaps_centroids_{component_min_distance}.gpkg"
)
# Interactive plot of adjacent componentsfeature_groups = []if len(osm_component_gaps_edges) > 0:# Feature groups for OSMosm_edges_simplified_folium = plot_func.make_edgefeaturegroup(gdf=osm_edges_simplified,mycolor=pdict["osm_base"],myweight=pdict["line_base"],nametag="OSM network",show_edges=True,)osm_component_gaps_edges_folium = plot_func.make_edgefeaturegroup(gdf=osm_component_gaps_edges,mycolor=pdict["osm_emp"],myweight=pdict["line_emp"],nametag="OSM: Adjacent disconnected edges",show_edges=True,)osm_component_gaps_folium = plot_func.make_markerfeaturegroup(gdf=osm_component_gaps, nametag="OSM: Component gaps", show_markers=True)feature_groups.extend([osm_edges_simplified_folium,osm_component_gaps_edges_folium,osm_component_gaps_folium,])# Feature groups for reference
if len(ref_component_gaps_edges) > 0:ref_edges_simplified_folium = plot_func.make_edgefeaturegroup(gdf=ref_edges_simplified,mycolor=pdict["ref_base"],myweight=pdict["line_base"],nametag=f"{reference_name} network",show_edges=True,)ref_component_gaps_edges_folium = plot_func.make_edgefeaturegroup(gdf=ref_component_gaps_edges,mycolor=pdict["ref_emp"],myweight=pdict["line_emp"],nametag=f"{reference_name}: Adjacent disconnected edges",show_edges=True,)ref_component_gaps_folium = plot_func.make_markerfeaturegroup(gdf=ref_component_gaps, nametag=f"{reference_name}: Component gaps", show_markers=True)feature_groups.extend([ref_edges_simplified_folium,ref_component_gaps_edges_folium,ref_component_gaps_folium,])m = plot_func.make_foliumplot(feature_groups=feature_groups,layers_dict=folium_layers,center_gdf=osm_nodes_simplified,center_crs=osm_nodes_simplified.crs,
)bounds = plot_func.compute_folium_bounds(osm_nodes_simplified)
m.fit_bounds(bounds)
m.save(compare_results_inter_maps_fp + "component_gaps_compare.html")display(m)
print("Interactive map saved at " + compare_results_inter_maps_fp.lstrip("../") + "component_gaps_compare.html")
Interactive map saved at results/COMPARE/cph_geodk/maps_interactive/component_gaps_compare.html
3.5 每个网格单元的组件
下图显示了与网格单元相交的组件数量。 网格单元中的组件数量过多通常表明网络连接较差 - 要么是由于基础设施分散,要么是因为数据质量问题。
plot_func.plot_saved_maps([osm_results_static_maps_fp + "number_of_components_in_grid_cells_osm",ref_results_static_maps_fp + "number_of_components_in_grid_cells_reference",]
)
3.6 组件连接
在这里,我们可视化每个单元格可以到达的单元格数量之间的差异。 该指标是网络连接性的粗略衡量标准,但具有计算成本低的优点,因此能够快速突出网络连接性的明显差异。
在显示到达的细胞百分比差异的图中,正值表示使用参考数据集的连接性较高,而负值表示可以从 OSM 数据中的特定细胞到达更多的细胞。
plot_func.plot_saved_maps([osm_results_static_maps_fp + "percent_cells_reachable_grid_osm",ref_results_static_maps_fp + "percent_cells_reachable_grid_reference",]
)
# Compute difference in cell reach percentage (where data for both OSM and REF is available)grid["cell_reach_pct_diff"] = (grid["cells_reached_ref_pct"] - grid["cells_reached_osm_pct"]
)
# Plotset_renderer(renderer_map)# norm color bar
cbnorm_diff = colors.Normalize(vmin=-100, vmax=100)fig, ax = plt.subplots(1, figsize=pdict["fsmap"])
from mpl_toolkits.axes_grid1 import make_axes_locatable
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="3.5%", pad="1%")grid.plot(cax=cax,ax=ax,alpha=pdict["alpha_grid"],column="cell_reach_pct_diff",cmap=pdict["diff"],legend=True,norm=cbnorm_diff,
)# Add no data patches
grid[grid["cell_reach_pct_diff"].isnull()].plot(cax=cax,ax=ax,facecolor=pdict["nodata_face"],edgecolor=pdict["nodata_edge"],linewidth= pdict["line_nodata"],hatch=pdict["nodata_hatch"],alpha=pdict["alpha_nodata"],
)# osm_edges_simplified.plot(ax=ax, color=pdict["osm_base"], alpha=1,linewidth=2)
# ref_edges_simplified.plot(ax=ax, color=pdict["ref_base"], alpha=1,linewidth=1)ax.legend(handles=[nodata_patch], loc="upper right")ax.set_title(f"{area_name}: {reference_name} difference to OSM in percent of cells reached"
)
ax.set_axis_off()
cx.add_basemap(ax=ax, crs=study_crs, source=cx_tile_2)plot_func.save_fig(fig, compare_results_static_maps_fp + "percent_cell_reached_diff_compare")
4.概括
# Load results from intrinsic
osm_intrinsic_df = pd.read_csv(osm_results_data_fp + "intrinsic_summary_results.csv",index_col=0,names=["OSM"],header=0,
)ref_intrinsic_df = pd.read_csv(ref_results_data_fp + "intrinsic_summary_results.csv",index_col=0,names=[reference_name],header=0,
)# Drop rows from OSM results not available for reference
osm_intrinsic_df.drop(["Incompatible tag combinations", "Missing intersection nodes"],axis=0,inplace=True,
)# Save new results
osm_intrinsic_df.at["Alpha", "OSM"] = osm_alpha
osm_intrinsic_df.at["Beta", "OSM"] = osm_beta
osm_intrinsic_df.at["Gamma", "OSM"] = osm_gammaref_intrinsic_df.at["Alpha", reference_name] = ref_alpha
ref_intrinsic_df.at["Beta", reference_name] = ref_beta
ref_intrinsic_df.at["Gamma", reference_name] = ref_gamma# Combine
extrinsic_df = osm_intrinsic_df.join(ref_intrinsic_df)
assert len(extrinsic_df) == len(osm_intrinsic_df) == len(ref_intrinsic_df)
extrinsic_df.style.pipe(format_extrinsic_style)
OSM | GeoDanmark | |
---|---|---|
Total infrastructure length (km) | 1,056 | 626 |
Protected bicycle infrastructure density (m/km2) | 5,342 | 2,999 |
Unprotected bicycle infrastructure density (m/km2) | 427 | 455 |
Mixed protection bicycle infrastructure density (m/km2) | 55 | 0 |
Bicycle infrastructure density (m/km2) | 5,825 | 3,454 |
Nodes | 5,016 | 4,125 |
Dangling nodes | 1,828 | 870 |
Nodes per km2 | 28 | 23 |
Dangling nodes per km2 | 10 | 5 |
Overshoots | 8 | 21 |
Undershoots | 18 | 11 |
Components | 356 | 204 |
Length of largest component (km) | 747 | 501 |
Largest component's share of network length | 91% | 80% |
Component gaps | 78 | 52 |
Alpha | 0.11 | 0.10 |
Beta | 1.15 | 1.14 |
Gamma | 0.38 | 0.38 |
5.保存结果
extrinsic_df.to_csv(compare_results_data_fp + "extrinsic_summary_results.csv", index=True
)with open(compare_results_data_fp + f"grid_results_extrinsic.pickle", "wb"
) as f:pickle.dump(grid, f)
from time import strftime
print("Time of analysis: " + strftime("%a, %d %b %Y %H:%M:%S"))
Time of analysis: Mon, 18 Dec 2023 20:25:24