vray阴天室内_阴天有话:第1部分

vray阴天室内

When working with text data and NLP projects, word-frequency is often a useful feature to identify and look into. However, creating good visuals is often difficult because you don’t have a lot of options outside of bar charts. Lets face it; bar charts get old and boring quick! This is where word clouds come into play. In this blog learn how to spice up your visualizations using word clouds on your next project.

在处理文本数据和NLP项目时,单词频率通常是识别和调查的有用功能。 但是,创建良好的视觉效果通常很困难,因为在条形图之外您没有太多选择。 面对现实吧; 条形图变老又无聊! 这就是词云发挥作用的地方。 在此博客中,学习如何在下一个项目中使用词云为您的可视化增添趣味。

Up until my most recent project I actually didn’t know a word cloud library existed in python, but I assure you it does, and it has some amazing features!

在我最近的项目之前,我实际上还不知道python中存在词云库,但是我向您保证,它确实存在,并且它具有一些惊人的功能!

The full WordCloud library and documentation can be found here for those interested.

完整的WordCloud库和文档可以在 此处 找到 感兴趣的人。

TLDR (TLDR)

Part 1 of this blog will walk you through obtaining the appropriate libraries and the basic parameters and functions of the wordcloud library as well as how to create a generic word cloud. Part 2 will build upon this and walk you through creating custom masks for word clouds and other unique visual options.

本博客的第1部分将引导您获得合适的库以及wordcloud库的基本参数和功能,以及如何创建通用词云。 第2部分将以此为基础,并引导您为词云和其他独特的视觉选项创建自定义蒙版。

WordCloud入门 (Getting Started With WordCloud)

Before we can start making visuals, we’ll need to make sure we have the libraries we need to create our word clouds. You’ll need the following libraries:

在开始制作视觉效果之前,我们需要确保拥有创建词云所需的库。 您将需要以下库:

  • numpy

    麻木
  • matplotlib

    matplotlib
  • PIL

    皮尔
  • wordcloud

    词云
  • nltk (This is only necessary for the purpose of this blog and as a source of sample text to create word clouds from)

    nltk (这仅对于本博客而言是必需的,并且作为从其创建词云的示例文本的来源)

All of these libraries can be pip installed if you’re unable to import them. For my specific project, I used Google Colab which required a slightly more unique solution to import wordcloud. For Google Colab users, you can use the following command to install wordcloud:

如果您无法导入所有这些库,则可以通过pip安装。 对于我的特定项目,我使用了Google Colab,它需要一个稍微独特的解决方案来导入wordcloud。 对于Google Colab用户,您可以使用以下命令来安装wordcloud:

!pip install git+https://github.com/amueller/word_cloud.git #egg=wordcloud

!pip安装git + https://github.com/amueller/word_cloud.git#egg = wordcloud

That last part is important for Colab because it identifies and effectively names the library so that it can be properly imported.

最后一部分对Colab很重要,因为它可以识别并有效地命名库,以便可以正确导入它。

Once we have all of our needed libraries installed, we can use the following set of import statements:

一旦我们安装了所有需要的库,就可以使用以下一组导入语句:

Image for post

We’re now ready to create some word clouds!

现在我们准备创建一些词云!

通用词云 (Generic Word Clouds)

To start with, lets explore generic word clouds. For those that want to follow along, we’ll use some corpora from the nltk library.

首先,让我们探索通用词云。 对于那些想要继续学习的人,我们将使用nltk库中的一些语料库。

First off, we’ll need to acquire our text. I’ll note here that there are two forms of text that WordCloud can use to generate a visual. The first, and the main one we’ll use, is in the form of a string. The second, is from a dictionary of words and their frequency as key-value pairs.

首先,我们需要获取文本。 我将在此处指出,WordCloud可使用两种形式的文本来生成视觉效果。 我们将使用的第一个也是主要的字符串形式。 第二个是来自单词字典及其作为键值对的频率。

If you’re following along, or want to attempt this using other sample text from nltk, you can use the following code to acquire our text samples:

如果您正在遵循,或者想使用来自nltk的其他示例文本来尝试此操作,则可以使用以下代码获取我们的文本示例:

Image for post
This shows a list of the different authors and texts we have to choose from within nltk’s gutenberg files
This shows a list of the different authors and texts we have to choose from within nltk’s gutenberg files
这显示了我们必须从nltk的gutenberg文件中选择的不同作者和文本的列表

Feel free to attempt creating word clouds from any of the above options. The one that we’ll continue with in these examples, however, will be Moby Dick.

随意尝试从以上任何选项创建词云。 但是,在这些示例中我们将继续讨论的是Moby Dick。

To gather our sample text as a single string you can use the following command:

要将示例文本作为单个字符串收集,可以使用以下命令:

Image for post

Now that we have our text, let’s take a look at how to turn this into a word cloud. What we’re doing in the code block below is instantiating a WordCloud object, we then use that object to generate a cloud based upon the text that we pass in. Once we have the cloud generated, we then want to be able to show it without the unnecessary x and y axis.

现在我们有了文本,让我们看一下如何将其变成词云。 在下面的代码块中,我们正在实例化一个WordCloud对象,然后使用该对象根据传入的文本生成一个云。一旦生成了云,我们便希望能够显示它没有不必要的x和y轴。

Image for post

Look at that! We made a word cloud!

看那个! 我们做了一个词云!

Now personally, I’m not a fan of the black background and it seems a little small, so let’s change that with some simple parameters.

现在我个人不喜欢黑色背景,而且看起来有点小,所以让我们用一些简单的参数来更改它。

Image for post

Now we’re talking! Although, there seems to be some strange things showing up in our generic word cloud doesn’t there?

现在我们在说话! 虽然,在通用词云中似乎有一些奇怪的事情出现了吗?

参数和语言处理 (Parameters and Language Processing)

Looking at the cloud above we notice some things. Some words seem to be paired.

看着上面的云,我们注意到一些事情。 有些话似乎成对出现。

  • the whale

    鲸鱼
  • the ship

  • the sea

  • the captain

    队长
  • White Whale

    白鲸

So on and so forth. Our word cloud is still showing word frequencies however one of the parameters WordCloud has is ‘collocations’ which it defaults to True. What this does is also looks at pairs of words and their frequencies. In some instances this can definitely be useful, but in this one I think we’ll get better results not using it.

等等等等。 我们的词云仍在显示词频,但是WordCloud的参数之一是“配置”,默认为True。 这还着眼于单词对及其频率。 在某些情况下,这绝对是有用的,但在我看来,不使用它会得到更好的结果。

Image for post

Notice the difference?

注意区别吗?

A keen eye may recognize that the word ‘the’ no longer appears in our word cloud. This is because ‘the’ is recognized as a stop-word and excluded from the cloud even though it appears quite frequently in the text.

敏锐的眼睛可能会意识到“ the”一词不再出现在我们的词云中。 这是因为“ the”被识别为停用词,即使在文本中出现频率很高,也被排除在云端之外。

You may be wondering where stop-words came into play, and that is one of the really cool features of the wordcloud library. The library comes with it’s own list of stop-words that it uses by default. The library actually uses quite a few NLP practices by default that makes creating the clouds that much easier and also adjustable for the more experienced NLP practitioner. Some of these additional NLP parameters that are used are:

您可能想知道停用词在哪里起作用,而这是wordcloud库的真正酷功能之一。 该库附带了它自己的默认停用词列表。 默认情况下,该库实际上使用了许多NLP实践,这使得创建云变得更加容易,并且对于经验丰富的NLP从业者而言也是可调整的。 使用的一些其他NLP参数是:

  • regexp — an optional parameter that if left blank will use r”\w[\w’]+” by default. Custom regex string can be passed in here.

    regexp —一个可选参数,如果保留为空白,默认情况下将使用r” \ w [\ w'] +” 。 自定义正则表达式字符串可以在此处传递。

  • normalize_plurals — default = True; For words that appear both with and without a trailing ‘s’, that ‘s’ is removed from the plural and it’s counted as another of it’s singular version

    normalize_plurals —默认= True; 对于同时带有和不带有尾部“ s”的单词,该“ s”将从复数形式中删除,并被视为另一个单数形式

In our original import statement we imported STOPWORDS from the wordcloud library. You can print this to see the entire list of words that are being excluded by default, but it currently uses 192 of the most common stop-words. You can also add to this list if you have additional words you want excluded. You can also supply your own stop-words if prefer. Note that the stopwords must be passed in as a set and not a list.

在原始的导入语句中,我们从wordcloud库中导入了STOPWORDS。 您可以打印此内容以查看默认情况下排除的单词的整个列表,但当前它使用192个最常用的停用词。 如果您想排除其他单词,也可以添加到此列表中。 如果愿意,您也可以提供自己的停用词。 请注意,停用词必须作为集合而不是列表传递。

Image for post

What a difference!

有什么不同!

One last thing we’ll talk about before moving on to making fun and unique word clouds is “relative scaling”.

在继续取笑和独特的词云之前,我们要谈论的最后一件事是“相对缩放”。

Relative scaling is what’s used to determine the size of the word based upon its frequency. By default, relative scaling is set to 0.5, which is essentially the equivalent of saying that a word that occurs twice as often as another word will be 50% larger.

相对缩放是根据单词的频率来确定单词大小的方法。 默认情况下,相对缩放比例设置为0.5,这基本上等于说一个单词出现的频率是另一个单词的两倍将增加50%。

Relative scaling can be set to any number between 0 and 1. With 0 being essentially kind of pointless as all words will be the same size, and 1 being that words that occur twice as often will be twice as large. In some cases this can be useful to better identify the differences in frequency. However, this doesn’t always look very good and can affect the fit of a word cloud to a mask which we will talk about later.

相对缩放比例可以设置为0到1之间的任何数字。0本质上是毫无意义的,因为所有单词的大小都相同,而1表示出现频率两倍的单词将是两倍大。 在某些情况下,这有助于更好地识别频率差异。 但是,这并不总是看起来很好,并且可能会影响词云与蒙版的匹配度,我们将在后面讨论。

Image for post

In this case, using a relative scaling of 1 actually doesn’t look too bad! We’ll soon see how this translates to using it with an image mask.

在这种情况下,使用1的相对比例实际上看起来还不错! 我们将很快看到如何将其转换为与图像蒙版一起使用。

保存您的词云 (Saving Your Word Cloud)

Once you have your word cloud the way you want it, you’ll probably want to save it. To do so, you can run the following code which will save the current state of your WordCloud object.

一旦有了您想要的词云,就可能要保存它。 为此,您可以运行以下代码来保存WordCloud对象的当前状态。

Image for post

Keep in mind this will save the image to your local folder and if you have a specific location in mind, you will need to add in the appropriate path.

请记住,这会将图像保存到本地文件夹,如果您有特定的位置,则需要添加适当的路径。

值得一玩的其他参数 (Other Parameters Worth Playing With)

We looked at the key parameters for making word clouds, but there are many more that are worth looking into and toying with. These parameters are fairly self-explanatory and can be used to further tweak your clouds:

我们研究了制作词云的关键参数,但是还有很多值得研究和研究的参数。 这些参数是不言自明的,可用于进一步调整云:

  • prefer_horizontal — (float)If set to 1, all words will appear horizontal while lower values will increase the frequency of vertical words. default = 0.9

    preferred_horizo​​ntal —(浮动)如果设置为1,则所有单词将显示为水平,而较低的值将增加垂直单词的频率。 默认值= 0.9

  • min_font_size — (int) Smallest font size to be used. default = 4

    min_font_size —(int)要使用的最小字体大小。 默认= 4

  • max_words — (int) default = 200

    max_words —(整数)默认= 200

  • min_word_length — (int) Minimum number of letters required in a word to be in the cloud. default = 0

    min_word_length —(int)单词在云中所需的最小字母数。 默认值= 0

  • include_numbers — (bool) default = False

    include_numbers —(布尔值)默认= False

  • repeat — (bool) Determines if words/phrases will be repeated until max_words or min_font_size is reached. (Can be used to create word clouds from a single word) default = False

    repeat —(布尔)确定是否重复单词/短语,直到达到max_words或min_font_size。 (可用于从单个单词创建单词云)default = False

独特和自定义词云 (Unique and Custom Word Clouds)

Due to this blog turning out much longer than I had initially planned, I’ll discuss using image masks to create custom word clouds, how to create your own image masks from any image, and how to apply an image’s color to your cloud in a soon to follow, Part 2 of this blog.

由于此博客的发布时间比我最初计划的要长得多,因此我将讨论使用图像蒙版创建自定义文字云,如何从任何图像创建自己的图像蒙版以及如何将图像的颜色应用于云中。不久之后,该博客的第2部分 。

翻译自: https://medium.com/swlh/cloudy-with-a-chance-of-words-part-1-d34a29739dba

vray阴天室内

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391018.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【codevs2497】 Acting Cute

这个题个人认为是我目前所做的最难的区间dp了,以前把环变成链的方法在这个题上并不能使用,因为那样可能存在重复计算 我第一遍想的时候就是直接把环变成链了,wa了5个点,然后仔细思考一下就发现了问题 比如这个样例 5 4 1 2 4 1 1 …

渐进式web应用程序_渐进式Web应用程序与加速的移动页面:有什么区别,哪种最适合您?

渐进式web应用程序Do you understand what PWAs and AMPs are, and which might be better for you? Lets have a look and find out.您了解什么是PWA和AMP,哪一种可能更适合您? 让我们看看并找出答案。 So many people own smartphones these days. T…

高光谱图像分类_高光谱图像分析-分类

高光谱图像分类初学者指南 (Beginner’s Guide) This article provides detailed implementation of different classification algorithms on Hyperspectral Images(HSI).本文提供了在高光谱图像(HSI)上不同分类算法的详细实现。 目录 (Table of Contents) Introduction to H…

在Java里如何给一个日期增加一天

在Java里如何给一个日期增加一天 我正在使用如下格式的日期: yyyy-mm-dd. 我怎么样可以给一个日期增加一天? 回答一 这样应该可以解决问题 String dt "2008-01-01"; // Start date SimpleDateFormat sdf new SimpleDateFormat("yyyy-MM-dd&q…

CentOS 7安装和部署Docker

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u010046908/article/details/79553227 Docker 要求 CentOS 系统的内核版本高于 3.10 ,查看本页面的前提条件来验证你的CentOS 版本是否支持 Docker 。通过 uname …

JavaScript字符串方法终极指南-拆分

The split() method separates an original string into an array of substrings, based on a separator string that you pass as input. The original string is not altered by split().split()方法根据您作为输入传递的separator字符串,将原始字符串分成子字符串…

机器人的动力学和动力学联系_通过机器学习了解幸福动力学(第2部分)

机器人的动力学和动力学联系Happiness is something we all aspire to, yet its key factors are still unclear.幸福是我们所有人都渴望的东西,但其关键因素仍不清楚。 Some would argue that wealth is the most important condition as it determines one’s li…

在Java里怎将字节数转换为我们可以读懂的格式?

问题:在Java里怎将字节数转换为我们可以读懂的格式? 在Java里怎将字节数转换为我们可以读懂的格式 像1024应该变成"1 Kb",而1024*1024应该变成"1 Mb". 我很讨厌为每个项目都写一个工具方法。在Apache Commons有没有这…

ubuntu 16.04 安装mysql

2019独角兽企业重金招聘Python工程师标准>>> 1) 安装 sudo apt-get install mysql-server apt-get isntall mysql-client apt-get install libmysqlclient-dev 2) 验证 sudo netstat -tap | grep mysql 如果有 就代表已经安装成功。 3)开启远程访问 1、 …

shell:多个文件按行合并

paste file1 file2 file3 > file4 file1内容为: 1 2 3 file2内容为: a b c file3内容为: read write add file4内容为: 1 a read 2 b write 3 c add 转载于:https://www.cnblogs.com/seaBiscuit0922/p/7728444.html

form子句语法错误_用示例语法解释SQL的子句

form子句语法错误HAVING gives the DBA or SQL-using programmer a way to filter the data aggregated by the GROUP BY clause so that the user gets a limited set of records to view.HAVING为DBA或使用SQL的程序员提供了一种过滤由GROUP BY子句聚合的数据的方法&#xff…

leetcode 1310. 子数组异或查询(位运算)

有一个正整数数组 arr,现给你一个对应的查询数组 queries,其中 queries[i] [Li, Ri]。 对于每个查询 i,请你计算从 Li 到 Ri 的 XOR 值(即 arr[Li] xor arr[Li1] xor … xor arr[Ri])作为本次查询的结果。 并返回一…

大样品随机双盲测试_训练和测试样品生成

大样品随机双盲测试This post aims to explore a step-by-step approach to create a K-Nearest Neighbors Algorithm without the help of any third-party library. In practice, this Algorithm should be useful enough for us to classify our data whenever we have alre…

vue组件命名指南,不为取名而纠结

前言 自古中国取名文化博大进深,往往取一个好的名字而绞尽脑汁.那么一个好名字能够带来什么呢? 名字的内涵必需和使用者固有的本性相配套不和名人重名、不易重名、创意新颖,真正体现通过名字以区分人的作用响亮上口读起来流畅好听,协音美好,…

JavaScript 基础,登录验证

<script></script>的三种用法&#xff1a;放在<body>中放在<head>中放在外部JS文件中三种输出数据的方式&#xff1a;使用 document.write() 方法将内容写到 HTML 文档中。使用 window.alert() 弹出警告框。使用 innerHTML 写入到 HTML 元素。使用 &qu…

使用final类的作用是什么?

问题&#xff1a;使用final类的作用是什么&#xff1f; 我在看一本关于Java的书&#xff0c;它里面说你可以定义一个类为final。我搞不明白有什么地方会被用到这样。 我是一个编程萌新。我想知道程序员在他们的程序里面都是怎么用fianl类的。如果知道他们是什么时候使用的话&…

photoshop cc_如何使用Photoshop CC将图片变成卡通

photoshop ccA fun photo effect is to make a photo look like a cartoon. In this tutorial you will learn how to use Photoshop CC to make a photo look like a cartoon drawing.有趣的照片效果是使照片看起来像卡通漫画。 在本教程中&#xff0c;您将学习如何使用Photos…

从数据角度探索在新加坡的非法毒品

All things are poisons, for there is nothing without poisonous qualities. It is only the dose which makes a thing poison.” ― Paracelsus万物都是毒药&#xff0c;因为没有毒药就没有什么。 只是使事物中毒的剂量。” ― 寄生虫 执行摘要(又名TL&#xff1b; DR) (Ex…

Android 自定义View实现QQ运动积分抽奖转盘

因为偶尔关注QQ运动&#xff0c; 看到QQ运动的积分抽奖界面比较有意思&#xff0c;所以就尝试用自定义View实现了下&#xff0c;原本想通过开发者选项查看下界面的一些信息&#xff0c;后来发现积分抽奖界面是在WebView中展示的&#xff0c;应该是在H5页面中用js代码实现的&…

瑞立视:厚积薄发且具有“工匠精神”的中国品牌

一家成立两年的公司&#xff1a;是如何在VR行业趋于稳定的情况下首次融资就获得如此大额的金额呢&#xff1f; 2017年VR行业内宣布融资的公司寥寥无几&#xff0c;无论是投资人还是消费者对这个 “宠儿”都开始纷纷投以怀疑的目光。但就在2017年7月27日&#xff0c;深圳市一家…