HDFS文件目录操作代码

分布式文件系统HDFS中对文件/目录的相关操作代码,整理了一下,大概包括以下部分:

  • 文件夹的新建、删除、重命名
  • 文件夹中子文件和目录的统计
  • 文件的新建及显示文件内容
  • 文件在local和remote间的相互复制
  • 定位文件在HDFS中的位置,以及副本存放的主机
  • HDFS资源使用情况

1. 新建文件夹

public void mkdirs(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);if (!fs.exists(path)) {fs.mkdirs(path);System.out.println("Create: " + folder);}fs.close();
}

 

2. 删除文件夹

public void rmr(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.deleteOnExit(path);System.out.println("Delete: " + folder);fs.close();
}

 

3. 文件重命名

public void rename(String src, String dst) throws IOException {Path name1 = new Path(src);Path name2 = new Path(dst);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.rename(name1, name2);System.out.println("Rename: from " + src + " to " + dst);fs.close();
}

 

4. 列出文件夹中的子文件及目录

public void ls(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FileStatus[] list = fs.listStatus(path);System.out.println("ls: " + folder);System.out.println("==========================================================");for (FileStatus f : list) {System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDirectory(), f.getLen());}System.out.println("==========================================================");fs.close();
}

 

5. 创建文件,并添加内容

public void createFile(String file, String content) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);byte[] buff = content.getBytes();FSDataOutputStream os = null;try {os = fs.create(new Path(file));os.write(buff, 0, buff.length);System.out.println("Create: " + file);} finally {if (os != null)os.close();}fs.close();
}

 

6. 将local数据复制到remote

public void copyFile(String local, String remote) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyFromLocalFile(new Path(local), new Path(remote));System.out.println("copy from: " + local + " to " + remote);fs.close();
}

 

7. 将remote数据下载到local

public void download(String remote, String local) throws IOException {Path path = new Path(remote);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyToLocalFile(path, new Path(local));System.out.println("download: from" + remote + " to " + local);fs.close();
}

 

8. 显示文件内容

    public String cat(String remoteFile) throws IOException {Path path = new Path(remoteFile);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FSDataInputStream fsdis = null;System.out.println("cat: " + remoteFile);OutputStream baos = new ByteArrayOutputStream();String str = null;try {fsdis = fs.open(path);IOUtils.copyBytes(fsdis, baos, 4096, false);str = baos.toString();} finally {IOUtils.closeStream(fsdis);fs.close();}System.out.println(str);return str;}

 

9. 定位一个文件在HDFS中存储的位置,以及多个副本存储在集群哪些节点上

public void location() throws IOException {String folder = hdfsPath + "create/";String file = "t2.txt";FileSystem fs = FileSystem.get(URI.create(hdfsPath), new Configuration());FileStatus f = fs.getFileStatus(new Path(folder + file));BlockLocation[] list = fs.getFileBlockLocations(f, 0, f.getLen());System.out.println("File Location: " + folder + file);for (BlockLocation bl : list) {String[] hosts = bl.getHosts();for (String host : hosts) {System.out.println("host:" + host);}}fs.close();
}

 

10. 获取HDFS集群存储资源使用情况

public void getTotalCapacity() {try {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FsStatus fsStatus = fs.getStatus();System.out.println("总容量:" + fsStatus.getCapacity());System.out.println("使用容量:" + fsStatus.getUsed());System.out.println("剩余容量:" + fsStatus.getRemaining());} catch (IOException e) {e.printStackTrace();}
}

 

完整代码

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URI;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.ContentSummary;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FsStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.mapred.JobConf;/*
* HDFS工具类
* 
*/
public class Hdfs {private static final String HDFS = "hdfs://10.20.14.47:8020/";public Hdfs(Configuration conf) {this(HDFS, conf);}public Hdfs(String hdfs, Configuration conf) {this.hdfsPath = hdfs;this.conf = conf;}private String hdfsPath;private Configuration conf;public static void main(String[] args) throws IOException {JobConf conf = config();Hdfs hdfs = new Hdfs(conf);hdfs.createFile("/create/t2.txt", "12");hdfs.location();}public static JobConf config() {JobConf conf = new JobConf(Hdfs.class);conf.setJobName("HdfsDAO");conf.addResource("classpath:/hadoop/core-site.xml");conf.addResource("classpath:/hadoop/hdfs-site.xml");conf.addResource("classpath:/hadoop/mapred-site.xml");return conf;}/** 创建文件夹*/public void mkdirs(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);if (!fs.exists(path)) {fs.mkdirs(path);System.out.println("Create: " + folder);}fs.close();}/** 删除文件夹*/public void rmr(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.deleteOnExit(path);System.out.println("Delete: " + folder);fs.close();}/** 文件重命名*/public void rename(String src, String dst) throws IOException {Path name1 = new Path(src);Path name2 = new Path(dst);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.rename(name1, name2);System.out.println("Rename: from " + src + " to " + dst);fs.close();}/** 列出文件夹中的子文件及目录*/public void ls(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FileStatus[] list = fs.listStatus(path);System.out.println("ls: " + folder);System.out.println("==========================================================");for (FileStatus f : list) {System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDirectory(), f.getLen());}System.out.println("==========================================================");fs.close();}/** 创建文件,并添加内容*/public void createFile(String file, String content) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);byte[] buff = content.getBytes();FSDataOutputStream os = null;try {os = fs.create(new Path(file));os.write(buff, 0, buff.length);System.out.println("Create: " + file);} finally {if (os != null)os.close();}fs.close();}/** 将local的数据复制到remote*/public void copyFile(String local, String remote) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyFromLocalFile(new Path(local), new Path(remote));System.out.println("copy from: " + local + " to " + remote);fs.close();}/** 将remote数据下载到local*/public void download(String remote, String local) throws IOException {Path path = new Path(remote);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyToLocalFile(path, new Path(local));System.out.println("download: from" + remote + " to " + local);fs.close();}/** 显示文件内容*/public String cat(String remoteFile) throws IOException {Path path = new Path(remoteFile);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FSDataInputStream fsdis = null;System.out.println("cat: " + remoteFile);OutputStream baos = new ByteArrayOutputStream();String str = null;try {fsdis = fs.open(path);IOUtils.copyBytes(fsdis, baos, 4096, false);str = baos.toString();} finally {IOUtils.closeStream(fsdis);fs.close();}System.out.println(str);return str;}/** 定位一个文件在HDFS中存储的位置,以及多个副本存储在集群哪些节点上*/public void location() throws IOException {String folder = hdfsPath + "create/";String file = "t2.txt";FileSystem fs = FileSystem.get(URI.create(hdfsPath), new Configuration());FileStatus f = fs.getFileStatus(new Path(folder + file));BlockLocation[] list = fs.getFileBlockLocations(f, 0, f.getLen());System.out.println("File Location: " + folder + file);for (BlockLocation bl : list) {String[] hosts = bl.getHosts();for (String host : hosts) {System.out.println("host:" + host);}}fs.close();}/** 获取HDFS资源使用情况*/public void getTotalCapacity() {try {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FsStatus fsStatus = fs.getStatus();System.out.println("总容量:" + fsStatus.getCapacity());System.out.println("使用容量:" + fsStatus.getUsed());System.out.println("剩余容量:" + fsStatus.getRemaining());} catch (IOException e) {e.printStackTrace();}}/** 获取某文件中包含的目录数,文件数,及占用空间大小*/public void getContentSummary(String path) {ContentSummary cs = null;try {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);cs = fs.getContentSummary(new Path(path));} catch (Exception e) {e.printStackTrace();}// 目录数Long directoryCount = cs.getDirectoryCount();// 文件数Long fileCount = cs.getFileCount();// 占用空间Long length = cs.getLength();System.out.println("目录数:" + directoryCount);System.out.println("文件数:" + fileCount);System.out.println("占用空间:" + length);}
}
View Code

 

转载于:https://www.cnblogs.com/walker-/p/9768834.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/279212.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

craigslist_如何设置Craigslist警报(用于电子邮件或SMS)

craigslistWhether you’re looking for apartments or used gadgets on Craigslist, you don’t have to keep checking the website. You can stay on top of things by getting notified when new posts go up that match your searches. 无论您是在Craigslist上寻找公寓还是…

Django模板语言中的自定义方法filter过滤器实现web网页的瀑布流

模板语言自定义方法介绍 自定义方法注意事项 Django中有simple_tag 和 filter 两种自定义方法,之前也提到过,需要注意的是 扩展目录名称必须是templatetagstemplatetags中的自定义标签和过滤器必须依赖于一个django app,也就是说,自定义标签和过滤器是绑…

dsp怪胎_2012年6月最佳怪胎文章

dsp怪胎This past month we covered topics such as why you only have to wipe a disk once to erase it, what RSS is and how you can benefit from using it, how websites are tracking you online, and more. Join us as we look back at the best articles for June. 在…

如何在Ubuntu上查看和写入系统日志文件

Linux logs a large amount of events to the disk, where they’re mostly stored in the /var/log directory in plain text. Most log entries go through the system logging daemon, syslogd, and are written to the system log. Linux将大量事件记录到磁盘上&#xff0c…

向Ubuntu提供反馈的5种方法

Ubuntu, like many other Linux distributions, is a community-developed operating system. In addition to getting involved and submitting patches, there are a variety of ways you can provide useful feedback and suggest features to Ubuntu. 与许多其他Linux发行版…

Tomcat 发布项目 conf/Catalina/localhost 配置 及数据源配置

本文介绍通过在tomcat的conf/Catalina/localhost目录下添加配置文件,来发布项目。因为这样对 tomcat 的入侵性最小,只需要新增一个配置文件,不需要修改原有配置;而且支持动态解析,修改完代码直接生效(修改配置除外)。在…

Centos7 中文乱码

1. 安装中文库 yum groupinstall "fonts" 2. 检查是否有中文语言包 locale -a 3. 查看当前系统语言环境 locale 解析如下 LANG:当前系统的语言LC_CTYPE:语言符号及其分类LC_NUMERIC:数字LC_COLLATE:比较和排序习惯LC_TIME&#xff…

chrome自动退出的原因_Chrome 70将让用户选择退出新的自动登录功能

chrome自动退出的原因An upcoming Chrome option allows users to log into Google accounts without logging into the browser. The change was prompted by a backlash among users and privacy advocates. 即将推出的Chrome选项允许用户无需登录浏览器即可登录Google帐户。…

学习笔记DL007:Moore-Penrose伪逆,迹运算,行列式,主成分分析PCA

2019独角兽企业重金招聘Python工程师标准>>> Moore-Penrose伪逆(pseudoinverse)。 非方矩阵,逆矩阵没有定义。矩阵A的左逆B求解线性方程Axy。两边左乘左逆B,xBy。可能无法设计唯一映射将A映射到B。矩阵A行数大于列数,方程无解。矩…

mysql40题_mysql40题

一、表关系请创建如下表,并创建相关约束导入现有数据库数据:/*Navicat Premium Data TransferSource Server : localhostSource Server Type : MySQLSource Server Version :50624Source Host : localhostSource Database : sqlexamTarget Server Type :…

ubuntu取消主目录加密_如何在Ubuntu上恢复加密的主目录

ubuntu取消主目录加密Access an encrypted home directory when you’re not logged in – say, from a live CD – and all you’ll see is a README file. You’ll need a terminal command to recover your encrypted files. 当您未登录时(例如,从实时CD)访问加密…

python数据结构与算法第六讲_Python 学习 -- 数据结构与算法 (六)

栈 是一种 “操作受限”的线性表,只允许在一端插入和删除数据。从功能是上来说,数组和链表确实可以替代栈,但是特定的数据结构是对特定场景的抽象,而且,数组或链表暴露了太多的操作接口,操作上的确灵活自由…

spring-springmvc code-based

idea设置maven在下载依赖的同时把对应的源码下载过来。图0:1主要实现零配置来完成springMVC环境搭建,当然现在有了springBoot也是零配置,但是很多同仁都是从spring3.x中的springMVC直接过渡到springBoot的,spring3.x的MVC大部分都…

powershell 入门_使用PowerShell入门的5个Cmdlet

powershell 入门PowerShell is quickly becoming the preferred scripting language and CLI of Power Users as well as IT Pros. It’s well worth learning a few commands to get you started, so we’ve got 5 useful cmdlets for you to learn today. PowerShellSwift成为…

Part 3: Services

介绍 在第3部分中,我们将扩展应用程序并启用负载平衡。为此,我们必须在分布式应用程序的层次结构中提升一个级别:服务。 StackServices (你在这里)Container (涵盖在第2部分中)关于服务 在分布式应用程序中,应用程序的不同部分被称为“服务”…

diy感应usb摄像头拍照_DIY无线感应充电器

diy感应usb摄像头拍照Courtesy of Instructables user Inducktion shares a very detailed tutorial on how to build a wireless power charger. He explains the impetus behind the project: 由Instructables用户提供Inducktion分享了有关如何构建无线电源充电器的非常详细…

常用模块之 time,datetime,random,os,sys

time与datetime模块 先认识几个python中关于时间的名词: 时间戳(timestamp):通常来说,时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”,返回的是float类型。1970年之前的日期无法以此表…

使用aSpotCat控制您的Android应用权限

Viewing the permissions of each installed Android app requires digging through the Manage Applications screen and examining each app one by one — or does it? aSpotCat takes an inventory of the apps on your system and the permissions they require. 要查看每…

xtrabackup备份mysql“ib_logfile0 is of different”错误分析

今天用xtrabackup工具完整备份mysql数据库的时候出现“./ib_logfile0 is of different”错误,具体的日志信息如下: 我第一时间查询了百度和谷歌都没有找见相对应的答案。决定从错误日志入手,上面的日志提示说:mysql数据库inondb的日志文件的大…

如何使自己的不和谐机器人

Discord has an excellent API for writing custom bots, and a very active bot community. Today we’ll take a look at how to get started making your own. Discord具有出色的用于编写自定义机器人的API,以及非常活跃的机器人社区。 今天,我们将探…