北方工业大学gpa计算_北方大学联盟仓库的探索性分析

北方工业大学gpa计算

This is my firts publication here and i will start simple.

这是我的第一篇出版物，这里我将简单介绍 。

I want to make an exploratory data analysis of UFRN’s warehouse and answer some questions about the data using Python and Power BI.

我想对UFRN的仓库进行探索性数据分析，并使用Python和Power BI回答有关数据的一些问题。

I have downloaded the dataset from “dados.gov.br”, wich is an open source website of public data from Brazil. You can have the used dataset here 👈

我已经从“ dados.gov.br ”下载了数据集，该数据集是来自巴西的公共数据的开源网站。您可以在这里拥有使用的数据集👈

QUESTIONS TO BE ANSWERED:

需要回答的问题：

What are the items with the greatest quantity in stock;
库存量最大的物品是什么？
What are the items with the greatest value in stock;
库存中价值最大的物品是什么？
Wich warehouse has the greatest quantity of items in stock and wich has the lower quantity;
Wich 仓库的库存物品最多，而数量 较少的 仓库 ;
Wich warehouse has the greatest value in stock;
Wich 仓库的库存价值最大。
What are the most expensive items.
什么是最昂贵的物品。

So, let’s start importing pandas in the Jupyter Notebook and opening the csv file:

因此，让我们开始在Jupyter Notebook中导入熊猫并打开csv文件：

Here ☝ we have our first problem:

☝我们有第一个问题：

File is strange. Seems that all informations are in the same column…

文件很奇怪 。似乎所有信息都在同一列中…

This factor makes it impossible to analyse the dataframe by column as we can see below:

这个因素使我们无法按列分析数据帧，如下所示：

The problem in this case is the separator. Pandas uses the comma as default delimiter (or separator), and in this file, the delimiter is semicolon.

在这种情况下，问题是分隔符。 Pandas使用逗号作为默认定界符(或分隔符)，在此文件中，定界符为分号。

So, we need to inform the delimiter right after the path file:

因此，我们需要在路径文件之后立即通知定界符：

Now, here we go! ✌

现在，我们开始！ ✌

Let’s check some informations about the dataset, like columns, data types, lenght, and missing values.

让我们检查有关数据集的一些信息，例如列， 数据类型 ，长度和缺失值。

Note that we have 5 columns:

请注意，我们有5列：

material (Item): object;almoxarifado (Warehouse): object;saldo (balance): float;preco (price): object;valor_total (Amount): object;

材料 (项目)：对象； almoxarifado (仓库)：对象； saldo (平衡)：浮动； preco (价格)：对象； valor_total (金额)：对象；

If we had to make some operations between columns, we should probably change some object types to integer or float values, but in this case we won’t need.

如果必须在列之间进行某些操作，则可能应将某些对象类型更改为整数或浮点值，但在这种情况下，我们将不需要。

Now we can analyse data through columns

现在我们可以通过列分析数据

☝ Above we have the second issue:

☝以上是第二个问题：

There are 2 different names for the same warehouse: “MEJC — ALMOXARIFADO NUTRIÇÃO” and “MEJC — ALMOXARIFADO DE NUTRIÇÃO”.

有对同一仓库2个不同的名字：“MEJC - ALMOXARIFADONUTRIÇÃO”和“MEJC - ALMOXARIFADO DENUTRIÇÃO”。

It’s a typo, so let’s make them one.

这是一个错字，因此让我们将它们设为一个。

Done!

做完了！

A very quick and easy way to have a lot of information in one code line is using Pandas Profilling.

使用Pandas Profilling是在一个代码行中具有很多信息的非常快速，简便的方法。

Analysing the earlier infos and Pandas Profilling below, we have: 9087 rows and 4536 distinct items. 49.9% unique items and 0 missing cells.

分析下面的早期信息和Pandas Profilling，我们有9087行和4536个不同的项目。 49.9％的独特物品和0个丢失的单元格。

It means that dataframe doesn’t need much effort in cleaning and treatment.

这意味着数据帧在清理和处理方面不需要太多的工作。

This dataset is small and brings us just a few columns. Like i said, first publication, simple things (sorry 😬)

这个数据集很小，只给我们带来了几列。就像我说的那样，第一次出版，简单的事情(对不起😬)

Excepting “saldo”, “preco” and “valor_total”, the columns are categorical data. So now, we can go to Power BI to make some visualizations and answer some questions.

除“ saldo”，“ preco”和“ valor_total”外，这些列均为分类数据。现在，我们可以去Power BI进行可视化并回答一些问题。

But first, we need to export the dataset we edited in Jupyter Notebook:

但是首先，我们需要导出在Jupyter Notebook中编辑的数据集：

With the file saved in the same folder of the actual project, we just need to open it in Power BI

将文件保存在实际项目的同一文件夹中，我们只需要在Power BI中打开它

The data is configured by default as a “Western European” source. Because of this, some words in brazilian portuguese are misspelled

默认情况下，数据配置为“西欧”来源。因此，巴西葡萄牙语中的某些单词 拼写错误

We solve this issue choosing “Unicode — UTF8” as the file source

我们选择“ Unicode — UTF8”作为文件源来解决此问题

At the Power Query editor, we can see that the columns are already in the right format (Text, Text, Role Number, Currency, Currency). So it doesn’t need any transformation, just close and apply.

在Power Query编辑器中，我们可以看到列的格式已经正确(文本，文本，角色编号，货币，货币)。因此，它不需要任何转换，只需关闭并应用即可。

LET’S GO TO THE QUESTIONS!

让我们去解决问题！

What are the items with the greatest quantity in stock:
什么是物品库存数量最多的库存：

Almost 2 millions syringes and double-distilled water.

近200万支注射器和双蒸馏水 。

2. What are the items with the greatest value in stock:

2.什么是物品库存最大的价值 ：

WOW! Wait..close to 1 million $ in coffee??? Seems that teachers are driven by coffee! 😂

哇！等待..接近一百万美元的咖啡 ??? 看来老师是被咖啡驱动的！ 😂

Note that the value of these 2 items is very different from the others (could they be outliers?)

请注意，这两项的价值与其他项有很大不同 (它们可能是离群值吗？)

3. Wich warehouse has the greatest quantity of items in stock and wich has the lower quantity:

3. Wich 仓库中的物品数量最多 ，而其中的物品数量 较少：

Despite these values are being rounded, the medicine warehouse has the largest number of items, followed by the infra-structure warehouse.

尽管对这些值进行了四舍五入，但药品仓库的物品数量最多 ，其次是基础结构仓库。

And the museum warehouse has de lower number of items.

博物馆仓库的物品数量较少 。

4. Wich warehouse has the greatest value in stock:

4. Wich 仓库的库存最大价值：

We saw earlier that the Medicine Warehouse has the largest number of items, slightly larger than that of Infrastructure. But the stock value of the Infra Warehouse is twice that of Medicine.

前面我们看到， 药品仓库的物品数量最多 ，比基础设施的数量略大。但是Infra仓库的库存价值 是医学的两倍。

5. What are the most expensive items.

5.什么是最昂贵的物品。

Ok, so a 250g package of coffee costs 30.4 thousands?? Sounds weird 🤔

好的，一包250克咖啡的价格为30,400欧元？听起来很奇怪🤔

Well, it’s true! 😐 Also 11 thousand $ in plastic cups.

好吧，这是真的！ 😐还有一万一千美元的塑料杯。

Another strange thing that i realize is that 107 items have 1 cent as a price:

我意识到的另一件奇怪的事是107件商品的价格为1美分 ：

Does all these items really costs just 1 cent? Someone will need to answer it.

所有这些物品真的只花费1美分吗？有人需要回答。

SO WHAT??

所以呢？？

Analysing this dataset we have seen that there is A LOT of money spent in coffe and plastic cup. ALMOST 1 MILLION!!!??? 💸💸💸

通过分析该数据集，我们发现在咖啡和塑料杯上花费了很多钱。几乎一百万!!!!!! 💸💸💸

Maybe something it’s wrong…With this information i would probably check the veracity of the facts. Perhaps, the item’s description is not as much meaningfull as it could be (how big is this package?)

也许有些问题……有了这些信息，我可能会检查事实的真实性。也许，该项目的描述没有那么有意义(此包装有多大？)

Some inconclusions can turn into conclusions. In this case, some data curation could be needed.

一些结论可以得出结论。在这种情况下，可能需要一些数据管理。

If you have some tips, comments or any suggestions, i would be very happy to learn more with you guys. I’m just a begginer.

如果您有任何提示，意见或建议，我将非常高兴与你们一起学习更多。我只是一个初学者。

Thank you for your time! 🙏

感谢您的时间！ 🙏

翻译自: https://medium.com/análises-exploratórias-de-dados/exploratory-analysis-of-ufrn-universidade-federal-do-rio-grande-do-norte-warehouses-e6f2ff334b0f

北方工业大学gpa计算

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/389288.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

泰坦尼克数据集预测分析_探索性数据分析-泰坦尼克号数据集案例研究（第二部分）

泰坦尼克数据集预测分析Data is simply useless until you don’t know what it’s trying to tell you.除非您不知道数据在试图告诉您什么，否则数据将毫无用处。 With this quote we’ll continue on our quest to find the hidden secrets of the Titanic. ‘The …

各种数据库连接的总结

SQL数据库的连接 return new SqlConnection("server127.0.0.1;databasepart;uidsa;pwd;"); oracle连接字符串 OracleConnection oCnn new OracleConnection("Data SourceORCL_SERVER;USERM70;PASSWORDmmm;");oledb连接数据库return new OleDbConnection…

关于我

我是谁？ Who am I？这是个哲学问题。。简单来说，我是Light，一个靠前端吃饭，又不想单单靠前端吃饭的Coder。用以下几点稍微给自己打下标签： 工作了两三年，对，我是16年毕业的90后一直…

L1和L2正则

https://blog.csdn.net/jinping_shi/article/details/52433975转载于:https://www.cnblogs.com/zyber/p/9257843.html

基于PyTorch搭建CNN实现视频动作分类任务代码详解

数据及具体讲解来源： 基于PyTorch搭建CNN实现视频动作分类任务 import torch import torch.nn as nn import torchvision.transforms as T import scipy.io from torch.utils.data import DataLoader,Dataset import os from PIL import Image from torch.autograd…

missforest_missforest最佳丢失数据插补算法

missforestMissing data often plagues real-world datasets, and hence there is tremendous value in imputing, or filling in, the missing values. Unfortunately, standard ‘lazy’ imputation methods like simply using the column median or average don’t work wel…

华硕猛禽1080ti_F-22猛禽动力回路的视频分析

华硕猛禽1080tiThe F-22 Raptor has vectored thrust. This means that the engines don’t just push towards the front of the aircraft. Instead, the thrust can be directed upward or downward (from the rear of the jet). With this vectored thrust, the Raptor can …

聊天常用js代码

温故而知新：柯里化与 bind() 的认知

什么是柯里化?科里化是把一个多参数函数转化为一个嵌套的一元函数的过程。（简单的说就是将函数的参数，变为多次入参） const curry (fn, ...args) > fn.length < args.length ? fn(...args) : curry.bind(null, fn, ...args); // 想要…

OPENVAS运行

https://www.jianshu.com/p/382546aaaab5转载于:https://www.cnblogs.com/diyunpeng/p/9258163.html

Memory-Associated Differential Learning论文及代码解读

Memory-Associated Differential Learning论文及代码解读论文来源： 论文PDF： Memory-Associated Differential Learning论文论文代码： Memory-Associated Differential Learning代码论文解读： 1.Abstract Conventional…

大数据技术学习之旅_如何开始您的数据科学之旅？

大数据技术学习之旅Machine Learning seems to be fascinating to a lot of beginners but they often get lost into the pool of information available across different resources. This is true that we have a lot of different algorithms and steps to learn but star…

纯API函数实现串口读写。

以最后决定用纯API函数实现串口读写。先从网上搜索相关代码（关键字：C# API 串口），发现网上相关的资料大约来源于一个版本，那就是所谓的msdn提供的样例代码（msdn的具体出处，我没有考证&#xff…

数据可视化工具_数据可视化

数据可视化工具Visualizations are a great way to show the story that data wants to tell. However, not all visualizations are built the same. My rule of thumb is stick to simple, easy to understand, and well labeled graphs. Line graphs, bar charts, and histo…

Android Studio调试时遇见Install Repository and sync project的问题

我们可以看到，报的错是“Failed to resolve: com.android.support:appcompat-v7:16.”，也就是我们在build.gradle中最后一段中的compile项内容。 AS自动生成的“com.android.support:appcompat-v7:16.”实际上是根据我们的最低版本16来选择16.x.x及以上编…

Apache Ignite 学习笔记(二): Ignite Java Thin Client

前一篇文章，我们介绍了如何安装部署Ignite集群，并且尝试了用REST和SQL客户端连接集群进行了缓存和数据库的操作。现在我们就来写点代码，用Ignite的Java thin client来连接集群。在开始介绍具体代码之前，让我们先简单的了解一下Ig…

VGAE（Variational graph auto-encoders）论文及代码解读

一，论文来源论文pdf Variational graph auto-encoders 论文代码 github代码二，论文解读理论部分参考： Variational Graph Auto-Encoders（VGAE）理论参考和源码解析 VGAE（Variational graph auto-en…

IIS7设置

IIS 7.0和IIS 6.0相比改变很大谁都知道，而且在IIS 7.0中用VS2005来调试Web项目也不是什么新鲜的话题，但是我还是第一次运用这个东东，所以在此记下我的一些过程，希望能给更多的后来者带了一点参考。其实我写这篇文章时也参考了其他…

tableau大屏bi_Excel，Tableau，Power BI ...您应该使用什么？

tableau大屏biAfter publishing my previous article on data visualization with Power BI, I received quite a few questions about the abilities of Power BI as opposed to those of Tableau or Excel. Data, when used correctly, can turn into digital gold. So what …

python 可视化工具_最佳的python可视化工具

python 可视化工具Disclaimer: I work for Datapane免责声明：我为Datapane工作动机 (Motivation) There are amazing articles on data visualization on Medium every day. Although this comes at the cost of information overload, it shouldn’t prevent you …

北方工业大学gpa计算_北方大学联盟仓库的探索性分析

相关文章