云尚制片管理系统
Data visualization is a key step of any data science project. During the process of exploratory data analysis, visualizing data allows us to locate outliers and identify distribution, helping us to control for possible biases in our data earlier on. Coupled with simple statistical tests, it can also answer many of the questions and can aid us in prioritizing areas to focus on.
数据可视化是任何数据科学项目的关键步骤。 在探索性数据分析过程中,可视化数据使我们能够找到异常值并识别分布,从而帮助我们尽早控制数据中可能存在的偏差。 结合简单的统计测试,它还可以回答许多问题,并可以帮助我们确定优先领域。
Here, I will go through some of the exploratory data analysis and data visualization steps in Python using Matplotlib and Seaborn libraries. The goal of the project is to analyze movie trends of the past decade to make suggestions in developing a new movie studio brand for a well-established corporation.
在这里,我将使用Matplotlib和Seaborn库完成一些探索性数据分析和数据可视化步骤。 该项目的目的是分析过去十年的电影趋势,为发展成熟的公司开发新的电影制片厂品牌提供建议。
方法 (Approach)
We explored the data with these two primary goals in mind.
考虑到这两个主要目标,我们探索了数据。
Building a global brand — We don’t just make movies, we make good movies that appeal to a global audience.
建立全球品牌- 我们不仅制作电影,而且制作吸引全球观众的优质电影。
Establishing a sustainable long-term plan —Making a sustainable business plan, not just a movie production plan.
建立可持续的长期计划- 制定可持续的商业计划,而不仅仅是电影制作计划。
数据结构 (Data Structure)
This is the basic structure of our cleaned Pandas data frame. We sourced our data from the Movie Database (TMDB), IMDB, and the Numbers. I recommend using the Movie Database (TMDB) API for the preliminary movie data.
这是我们清理过的熊猫数据框的基本结构。 我们从电影数据库(TMDB),IMDB和数字中获取数据。 我建议使用电影数据库(TMDB)API来获取初步的电影数据。
勘探 (Exploration)
最初设定 (Initial Setup)
总收入分配 (Distribution of Gross Revenue)
Let’s start looking at the distribution of the overall gross revenues for domestic and worldwide. Seaborn’s distplot plots histogram along with KDE (Kernel Density Estimate) plot.
让我们开始看看国内和全球总收入的分布。 Seaborn的distplot绘制直方图以及KDE(内核密度估计)图 。
We can see that it is strongly right skewed, it is a pretty usual trend for income data. Taking the log transformation of this data can help us visualize what’s happening in the dense area more clearly.
我们可以看到它是非常右偏的,对于收入数据来说这是很常见的趋势。 对这些数据进行对数转换可以帮助我们更清晰地可视化密集区域中发生的情况。
Not surprisingly, It seems like the global market yields higher revenues on average. Let’s look at the relationship between the budget and revenue.
毫不奇怪,似乎全球市场平均产生更高的收入。 让我们看一下预算与收入之间的关系。
预算收入 (Budget to Revenue)
Now we want to visualize the relationship between production budget and gross revenue, which are two continuous variables using scatter plots. There are many ways to achieve this. Here, I used the overlaid scatter plots to look at the global and domestic gross revenues together.
现在我们要形象化生产预算和总收入之间的关系,这是使用散点图的两个连续变量。 有很多方法可以实现这一目标。 在这里,我使用叠加的散点图一起查看了全球和国内总收入。
It seems like a high budget does not always lead to high revenue especially in the domestic market. Also some movies yield high revenues with relatively lower budgets when it targets the global market. Let’s take a closer look at which genres might return the most return for its investment.
似乎高预算并不总是导致高收入,尤其是在国内市场。 此外,某些电影面向全球市场时,其预算却相对较低,可带来高额收入。 让我们仔细研究一下哪些类型的内容可能会为其投资带来最大的回报。
体裁分布 (Distribution of Genre)
We can look at the percentage of each genre in our dataset using a bar plot.
我们可以使用条形图查看数据集中每种类型的百分比。
We see that about 30% of our data is action movies.
我们看到大约30%的数据是动作电影。
各类型的收益与成本比率 (Revenue to Cost Ratio of Each Genre)
Which genres have the highest return per investment?
哪种类型的单笔投资回报最高?
Based on the global gross revenue to budget ratio, horror films on average make the most return per investment. But this does not necessarily mean that horror movies bring the most profit. Horror movies might take less production budget to make, thus yielding a higher percentage of return per cost. We can compare the budget of each genre using a box plot.
根据全球总收入与预算的比率,恐怖电影平均每笔投资回报最高。 但这并不一定意味着恐怖电影会带来最大的收益。 恐怖电影可能需要较少的制作预算,因此产生更高的单位成本回报率。 我们可以使用箱形图比较每种类型的预算。
各类型的平均制作预算 (Average Production Budget of Each Genre)
As we suspected, horror movies usually require a little budget to start out. On the other hand, action, animation and some family films tend to have higher budgets. Then which genre of movies yield the most profit? (Here I’m using the term “profit” liberally to mean global gross revenue minus the production budget. In reality, we cannot entirely know what the total cost involved in the movie production, distribution and marketing is to validate this measure.)
正如我们所怀疑的,恐怖电影通常需要很少的预算才能开始。 另一方面,动作,动画和一些家庭电影往往预算较高。 那么哪种类型的电影收益最大? (在这里,我用“利润”一词来表示全球总收入减去制作预算。实际上,我们不能完全知道电影制作,发行和营销所涉及的总成本是如何验证这一指标的。)
各类型的利润 (Profit of Each Genre)
(code is similar to above)
(代码与上面类似)
In fact, the genre that usually yields the highest profit is animation, followed by family and action. We can also look at this relationship between production budget and gross revenue of each genre by plotting a linear model plot.
实际上,通常产生最高利润的类型是动画,其次是家庭和动作。 通过绘制线性模型图,我们还可以查看每种类型的生产预算与总收入之间的这种关系。
Looking at the linear model plot, it’s clear that with a very few exceptions, horror movies are low-cost and do not quite make a lot of revenues. Also high average profit for adventures seem to be from a handful of rare successes. It seems like feasible money-makers are action and animation. Action shows stronger correlation between budget and gross revenue, while animation seems to allow some of the high successes with relatively lower budget.
查看线性模型图 ,很明显,除了少数例外,恐怖电影是低成本的,并且收入不高。 冒险的高平均利润似乎也来自少数难得的成功。 似乎可行的赚钱活动是动作和动画。 动作显示预算与总收入之间的相关性更强,而动画似乎可以在预算相对较低的情况下取得一些成功。
We can simply compute correlations for each genre to confirm this.
我们可以简单地计算每种类型的相关性以确认这一点。
for g in df[‘genre’].unique():corr = df[df.genre == g][‘budget’].corr(df[df.genre == g][‘glob_gross’])print(f”{g}: {round(corr, 2)}”)# Action: 0.74
# Animation 0.60
# slightly higher correlation between global gross revenue and budget for action films.
But the profit is not everything. As a brand new studio, we want to build a reputation and elevate our brand image to level with other established studio brands. This requires making reputable and award-worthy movies, as well as popular movies that go viral. Let’s see which genre tends to earn this status.
但是利润不是一切。 作为一个全新的工作室,我们希望建立声誉并提升我们的品牌形象,使其与其他知名工作室品牌保持一致。 这就要求制作著名的和值得奖赏的电影,以及流行的流行电影。 让我们看看哪种流派倾向于获得这种地位。
等级 (Ratings)
A majority of horror movies don’t get high average ratings on IMDB, while biography or drama films tend to do well. We should investigate which type of biography or drama films are worth investing into. On the other hand, an all time winner seems like an animation, which often yields high revenue and high ratings. Only downside is that the award opportunities for animations are relatively slim.
大多数恐怖电影在IMDB上的平均收视率都不高,而传记或戏剧电影则表现良好。 我们应该调查哪些传记或戏剧电影值得投资。 另一方面,一个历来的赢家似乎就像一个动画,通常会带来高收入和高收视率。 唯一的缺点是动画的获奖机会相对较少。
人气度 (Popularity)
We can see that action, adventure and animation are the most popular genres, based on the TMDB popularity score, while comedy, horror and biography films tend to be less so. For building a global brand presence and high profit, action, adventure and animation are good areas to target. We will look at these three genres first.
根据TMDB的人气得分,我们可以看到动作,冒险和动画是最受欢迎的类型,而喜剧,恐怖和传记电影则不那么受欢迎。 对于建立全球品牌影响力和高利润而言,动作,冒险和动画是理想的目标领域。 我们将首先看这三种类型。
超级英雄动作片 (Superhero Action Films)
One thing that stood out from our dataset was that 3 out of 5 top profit action movies were superhero movies from Marvel production. Superhero film market has skyrocketed in the past decade and will be a difficult wall to break as a new studio, since most of them are sequels based on deep-rooted fandoms. So I decided to filter these superhero films based on the name of writers and directors by adding a new column ‘superhero’.
从我们的数据集中脱颖而出的一件事是,五部最赚钱的动作片中有三部是来自漫威制作的超级英雄电影。 在过去的十年中,超级英雄电影市场飞速发展,作为一个新的制片厂,这将是很难打破的一堵墙,因为其中大多数都是基于根深蒂固的狂热分子的续集。 因此,我决定根据作者和导演的姓名来过滤这些超级英雄电影,方法是添加一个新列“ superhero”。
Swarm plot is a good way to look at distribution of continuous values based on two other categorical values. Here, we can see that a big chunk of high profit action movies are indeed superhero films. Also even though not depicted here, most of successful non-superhero films are sequels (for both action and animation). It might be worthwhile to add a sequel as a feature for more deeper analysis.
Swarm图是查看基于其他两个分类值的连续值分布的好方法。 在这里,我们可以看到大量的高利润动作电影确实是超级英雄电影。 同样,尽管这里没有描述,但大多数成功的非超级英雄电影都是续集(用于动作和动画)。 可能需要添加续集作为更深入的分析功能。
动作,动画,冒险 (Action, Animation, Adventure)
We can see here that animation on average tends to be more successful globally and domestically.
我们在这里可以看到,动画在全球和国内平均而言更趋于成功。
获奖电影 (Award Winning Films)
So far we established that given a high budget, animation is perhaps a less risky genre to invest in. But we also want to invest in non-animation films to expand our chance of winning awards and establishing the reputation. Earlier we saw that biography and drama films tend to get rated high.
到目前为止,我们已经确定,在预算较高的情况下,动画可能是投资风险较小的类型。但是,我们也希望投资于非动画电影,以扩大获得奖项和建立声誉的机会。 之前我们看到传记和戏剧电影的收视率往往很高。
This plot shows that generally higher rating is associated with higher profit, but not by much. Also there seems to be some drama films that are following a different trend. We should look more into the sub-genre of drama films.
该图表明,较高的评级通常与较高的利润相关,但关系不大。 似乎有些戏剧电影也遵循不同的趋势。 我们应该更多地研究戏剧电影的子流派。
Strip plot is a scatter plot for categorical value, which adds a bit of horizontal jitter making it easier to visualize the density of values. It’s hard to observe strong trends here as there are too many categories and not enough observation, other than that there many of the drama films have a sub-genre of romance.
带状图是分类值的散点图,它增加了一些水平抖动,从而更易于可视化值的密度。 在这里很难观察到强烈的趋势,因为类别太多,观察不够,除了许多戏剧电影都具有浪漫的亚体。
Simple t-test showed that there are statistically significant differences in average IMDB rating between drama and biography films (p < 0.01), but not in profit or budget. So we should focus on making a biography film instead.
简单的t检验表明,戏剧电影和传记电影之间的IMDB平均评分存在统计学差异( p <0.01 ),但利润或预算上没有差异。 因此,我们应该专注于制作传记电影。
每月趋势 (Monthly Trend)
Lastly, we looked at when is the best time to release the movie to maximize the profit using line plots.
最后,我们用线图研究了何时发行电影以最大化利润的最佳时间。
Looking at the annual trend, we can see that movies released in April to June tend to be the highest revenue yielding. This would be a great time to release our globally appealing animation.
从年度趋势来看,我们可以看到4月至6月发行的电影收益最高。 这将是发布我们具有全球吸引力的动画的绝佳时机。
Highly acclaimed movies are released close to the end of the year during the “Oscar Seasons” to maximize their exposures to critics. We recommend releasing our award worthy biography films during this time and elevate our brand to the level of other established studios.
备受赞誉的电影将在“奥斯卡季”(Oscar Seasons)临近年底发行,以最大程度地提高对评论家的曝光率。 我们建议您在这段时间内发布我们的获奖传记电影,并将我们的品牌提升到其他知名制片厂的水平。
结论 (Conclusion)
We reviewed the movie data from the past decade to propose a few recommendations and guidelines to start a movie studio. Horror movies yield the highest percentage return per investment and it requires a little budget to start out. But it’s not a good genre to start with, as it is usually not popular or highly rated, and does not bring in high revenue. To maximize the profit and to develop global presence, investing in animation films is encouraged. As well to target awards, in order to elevate the brand reputation, we suggested making biography films. An annual plan to synergize productions of two separate lines of films (profitable animation and award-worthy biography) is suggested.
我们回顾了过去十年的电影数据,提出了一些建议和指导方针来建立电影制片厂。 恐怖电影的单笔投资回报率最高,而且制作预算也很少。 但这并不是一个很好的类型,因为它通常不受欢迎或评级很高,并且不会带来高收入。 为了最大化利润并发展全球影响力,鼓励在动画电影上投资。 除了获得奖项之外,为了提升品牌声誉,我们建议制作传记电影。 建议制定一项年度计划,以使两行不同的电影(有益的动画和获奖的传记)的制作相互协调。
For a more in depth process, you can check out the Github page here. This project was a collaboration done in collaboration with my colleague Paul Torres.
有关更深入的过程,您可以在此处查看Github页面。 这个项目是与我的同事Paul Torres合作完成的。
翻译自: https://medium.com/swlh/future-of-a-movie-studio-29a65fcf48c
云尚制片管理系统
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388181.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!