我正在尝试在清理文本文件后在python中创建wordcloud,
我得到了所需的结果,即大多数在文本文件中使用但无法绘制的单词.
我的代码:
import collections
from wordcloud import WordCloud
import matplotlib.pyplot as plt
file = open('example.txt', encoding = 'utf8' )
stopwords = set(line.strip() for line in open('stopwords'))
wordcount = {}
for word in file.read().split():
word = word.lower()
word = word.replace(".","")
word = word.replace(",","")
word = word.replace("\"","")
word = word.replace("“","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
d = collections.Counter(wordcount)
for word, count in d.most_common(10):
print(word , ":", count)
#wordcloud = WordCloud().generate(text)
#fig = plt.figure()
#fig.set_figwidth(14)
#fig.set_figheight(18)
#plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3))
#plt.title(title, color=fontcolor, size=30, y=1.01)
#plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor)
#plt.axis('off')
#plt.show()
编辑:
用以下代码绘制wordcloud:
wordcloud = WordCloud(background_color='white',
width=1200,
height=1000
).generate((d.most_common(10)))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
但是得到TypeError:预期的字符串或缓冲区
当我用.generate(str(d.most_common(10))尝试上述代码时
形成的单词云在几个单词之后显示’trotrophe(‘)符号
using Jupyter Notebook | python3 | Ipython
解决方法:
首先将此文件Symbola.ttf下载到以下脚本的当前文件夹中.
架构文件:
file.txt Symbola.ttf my_word_cloud.py
file.txt的:
foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz
foo foo foo foo foo foo foo foo foo foo bizz bizz bizz bizz foo foo
my_word_cloud.py:
import io
from collections import Counter
from os import path
import matplotlib.pyplot as plt
from wordcloud import WordCloud
d = path.dirname(__file__)
# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'file.txt')).read()
words = text.split()
print(Counter(words))
# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'Symbola.ttf')
word_cloud = WordCloud(font_path=font_path).generate(text)
# Display the generated image:
plt.imshow(word_cloud)
plt.axis("off")
plt.show()
结果:
Counter({'foo': 17, 'bizz': 9, 'buzz': 5})
请参阅许多其他示例,在这里我为您创建了一个简单示例:
标签:word-cloud,python,matplotlib,plot
来源: https://codeday.me/bug/20191013/1905740.html