牧羊人nacy
这是通用的工作示例
(表数据)标记。它返回带有内部列的行的列表。 | 第一行仅接受一个(表头/数据)。def tableDataText(table): rows = [] trs = table.find_all('tr') headerow = [td.get_text(strip=True) for td in trs[0].find_all('th')] # header row if headerow: # if there is a header row include first rows.append(headerow) trs = trs[1:] for tr in trs: # for every table row rows.append([td.get_text(strip=True) for td in tr.find_all('td')]) # data row return rows使用它,我们得到(前两行)。list_table = tableDataText(htmltable)list_table[:2][['Rank', 'Name', "GDP (IMF '19)", "GDP (UN '16)", 'GDP Per Capita', '2019 Population'], ['1', 'United States', '21.41 trillion', '18.62 trillion', '$65,064', '329,064,917']]可以轻松地将其转换pandas.DataFrame为更高级的工具。import pandas as pddftable = pd.DataFrame(list_table[1:], columns=list_table[0])dftable.head(4) |
---|