The HDFS filesystem is just an abstraction layer on top of the local filesystem on the node(s). There are 2 methods built-in methods to access and manipulate the HDFS filesystem, the CLI and a web based GUI.
The Hadoop HDFS filesystem CLI commands are part of the hadoop binary and are accessed using the dfs option. Add the -fs option to override the filesystem specified in the config file or use local to use the local filesystem.
Listing files is done with the -ls switch. Optionally pass a path to list. The -lsr switch is used to recursively list files.
$ hadoop fs -ls / Found 3 items drwxrwxrwx - 0 1969-12-31 18:00 / drwxrwxrwx - 0 1969-12-31 18:00 /tmp drwxrwxrwx - 0 1969-12-31 18:00 /user
Disk usage can be found with the -du option. This is similar to the linux du command.
$ hadoop dfs -du /tmp Found 2 items 1205891 hdfs://localhost:9000/tmp/feder16.txt 301987 hdfs://localhost:9000/tmp/hadoop-hadoop
Creating directories is done with the -mkdir flag with the directory to create passed as an option.
$ hadoop dfs -mkdir /in
Copying and moving files are done with the -cp and -mv options respectively. The source and destination are passed in that order after the -cp or -mv.
$ hadoop dfs -cp /tmp/feder16.txt /in $ hadoop dfs -mv /tmp/feder16.txt /tmp/feder16.txt.2
Adding files to HDFS is done with the -put option. This was briefly covered in the pseudo-cluster operation. The -copyFromLocal option is an alias for -put. Pass the options in source then destination order. There is a -moveToLocal option that will delete the source file after uploading to HDFS.
$ hadoop dfs -put feder16.txt / $ hadoop dfs -copyFromLocal feder16.txt /in
After running a job the output data needs to be downloaded from HDFS. The -get option is used for that, the -copyToLocal option is an alias for -put. To delete files from HDFS once they've been copied back to the local disk use the -moveToLocal option.
$ hadoop dfs -put feder16.txt / $ hadoop dfs -copyFromLocal feder16.txt /in
Deleting files is done with the -rm option. To delete files and directories recursively use -rmr.
$ hadoop dfs -rm /in/feder16.txt Deleted hdfs://localhost:9000/in/feder16.txt $ hadoop dfs -rmr /in Deleted hdfs://localhost:9000/in
There are many more including cat, tail, chmod, chown and stat, for a full list run hadoop dfs -help.
In addition to the CLI the Hadoop HDFS has an integrated web interface available at http://localhost:50070/ . This interface provides information about the NameNode and the ability to browse the filesystem.