The Hadoop Distributed File System (HDFS)

Data is the new fuel. Data is collected on large scale for analysis like deriving trending topics, Security analysis (behavior analysis), pattern recognition, Machine Learning, etc. Therefore there is a need to store these huge data reliably.

The HDFS was developed at Yahoo and later Open Sourced. Lets start with HDFS Overview:
HDFS is file system component of Hadoop and it is patterned like UNIX file system. It is designed for two main purposes:

  • store large data sets reliably, the data sets grow with demand.
  • stream these data to user applications at high I/O bandwidth.

The important characteristics of HDFS is partitioning the data as well as computation across the machines and executing computations in parallel to data. HDFS stores file system metadata and application data separately.

For Reliability:
HDFS replicate content on different nodes. By default it replicates data on 3 nodes, it can be modified as HDFS provides an API that exposes the replication factor.

For Namespace:
HDFS namespace is hierarchy of files and directories. These entire namespace is kept in RAM. I will cover more of it in later sections.


I) NameNode:
NameNode stores the system metadata information. Also contains inode and Namespace information. As in unix like system: inode contains file attributes like file permission, modification, access time, disk space quota, etc. Application data is split into blocks of 128MB and each data block is replicated on Data Nodes.

a. Data Read: An HDFS client contact NameNode to get all the locations of data blocks and read blocks content from the nearest DataNode. If 1st read fails, read 2nd replica and so on. If there is need for more, a new block will be allocated and file will be written in pipelined way.

b. Data Write: An HDFS Client asks NameNode to nominate 3 DataNodes to host replicas. Then client writes in pipeline fashion.

c. NameSpace: HDFS keeps entire Namespace in RAM, so to have faster read and write.

d. Image: The inode data and list of blocks belonging to each file compromise the metadata is called as Image.

e. Checkpoint: For backup, reliability, persistent record of image is stored in local hosts native FS.

f. Journal: A log file, contains information about the changes happening in the system. Modified log file of image is stored in local host native FS.

Furthermore for durability redundant copies of checkpoint and journal can be made.

II) DataNode:
It stores the application data, each block of replica is represented as two files:
first the data itself, second checksum and block generation number. When a new DataNode is added or on startup, handshake between the NameNode and DataNode is performed.
Handshake basically does this:

  • Verify the namespace Id of the dataNode,
  • Software version of the datanode

If neither of these matches then DataNode is disconnected and shutdown.
If it is successful, the DataNode is given a storage Id at registration and it remains constant after it.

DataNode sends a heartbeat signal each 3 secs. NameNode communicates to DataNode as reply to DataNodes’ heartbeat. Basically the communication involves to decide whether to replicate a block to other nodes, or remove the duplicates or send some reports.

Each hour DataNode also sends block-reports to NameNode which contains block Id, Generation Stamp and length of each block.

<< Image of HDFS Client Interactions >>

III) HDFS Client:
It is a library that exposes the HDFS FS interface. HDFS supports read, write and delete operations.
As explained in NameNode (Point a, b) HDFS read and write are supported using Client.

IV) BackUp Node:
It is in-memory up to date image of NameSpace. NameNode stores the journal info in Backup Node.

So this covers the basic HDFS architecture and overview, that will give you good overview of each terms that is used while discussing HDFS. To further read the detailed information of HDFS, you should refer these links below:

The Hadoop Distributed File System published in 2010 at IEEE 26th Symposium on Mass Storage Systems and Technologies.
Hadoop Wiki contains latest information on Hadoop.

Understanding Big Data Systems

There is lot of talk about the Big Data Systems, many new system architecture with different functionalities have came up. I am sharing my knowledge about some Big Data Systems layered architecture. I plan to cover these topics in depth:
1. HDFS (Hadoop Distributed File Systems),
2. YARN (Yet Another Resource Negotiator),
3. Tachyon,
4. Apache Spark,
5. Spark Streaming, Spark SQL
6. Storm/Heron,
7. GraphX,
8. Blinkdb.

I will cover these topics with their system overview, history, architecture and where and how they are used in practice and some resourceful links to get more information about them.

My first Blood Donation

March 20th 2014, I have fear of blood. Even I could not behold myself in the hospital environment. You must have sensed a different type of smell when you have enter in hospital. That feeling came to my mind when I entered the blood donation camp organised in my office campus. I first sensed i t and told myself to be brave and bold. Even people who are physically weaker are donating blood, why can’t I ? That boldness kept me there but as they asked me to take my blood sample, I was stuck all my boldness went for a toss. My mind said “Mujhse nai ho payega” or in English i can’t do it. I failed and left the donation camp. Somehow after leaving the camp I felt relieved that I tried. But I was not feeling happy.

March 25th 2015, My Company again organised the same blood donation camp. A year passed and I almost forgot the last years nausea feeling. My friends again encouraged me to come and donate but I refused. Later I felt coward but ignored that feeling. But just after half an hour another of my team mate somehow convinced me to come and donate. His words gave confidence that nothing happens. It just a first time fear and you will easily get over it. Now I thought lets go, will see what ever happens (in hindi “jo hoga dekha jayega”). I went the same nausea feeling started coming back. But this time I kept myself confident. I filled the form, in which they ask about your past treatments or operations you had or any type of allergy you have. For all I was no. I was perfect healthy plus more enthusiastic to donate this time. They took sample blood from my finger tip on which they do some test and know your blood type. I am ‘O+’, universal donor. After hearing that I was glad that my blood can be donated to anyone. How good it is! Someone in need or don’t know if one life can be saved.

Now the real test starts, they called me in and asked to drink water. It basically makes your blood thinner. After 5 minutes they asked me to go to a hall where many people were already giving the blood. Some were sitting on the comfortable sofa’s while some were lying on the bed. However they are comfortable. They asked me to sit on a chair which was pretty comfortable. I lied down and was feeling good. But as he the guy took the needle that was about to get inserted in my veins. My heart beat increased two folds.  I turned my head on other side. It felt like an ant’s bite. It was easy, no pain, simple procedure. Blood started to go out of my veins and I can feel it. Initially it was all fine. They then started some videos on the screen that were kept in the hall. I was watching it. The time that it displayed was 4:30 pm.

All was good, after few seconds video stopped and only computer screen was showing off and they started music in low volume. Clocked ticked to 4:31 pm. Now I was feeling a bit weak. I thought that it should be normal as I am losing my blood. Time was stalled. The music was slow and making me to get sleep. Wait It was not music it was the weakness that was forcing me to sleep. I tried to stay awake and was steering on the screen. Suddenly i felt the darkness growing in my eyes. The dark shades were surrounding me from both sides. I could not interpret what was happening. It should not happen. I tried to call the invigilators that was sitting next to me. Suddenly my voice could not go out of my mouth. It was first time that something like that happened with me. My brain could not relate it to anything. I thought I should do something to stop it. I tried to raise my hand but my hand felt so heavy that i couldn’t even lift it !!

I knew I will be unconscious in sometime. But somehow my body was in comfortable position and telling me to go to sleep, everything will be fine. I left any attempt to inform my peer and went to sleep. Time was 4:32 pm. I slept or became unconscious, I still don’t know.

4:35 pm. I was suddenly woken up by 5 to 6 people surrounding me.  All asking are you okay ? I was feeling too sleepy or unconscious I don’t know what to tell, even if I tried my voice won’t come out. They had already removed the needle from my left hand. And took me to nearest bed. They person sitting next to me told “Sir, your mobile”. I was puzzled mobile what ? then he told it fell down from your pocket and is below your chair. I thought “How that can happen ?”. The volunteer told its okay, we will bring it to you. You just go and lie on the bed. I noticed time was 4:36 pm now.

I was feeling too weak. But I don’t know  what happened between 4:33 to 4:35 pm. But somehow it got over, now I was relaxing over the bed pondering whether it is good or bad experience. Somehow I felt more confident now, my fear of hospital and its pungent smell seemed much less fearing. I am now feeling more confident that second time if I donate blood, i wont become unconscious. The fear that I had felt gone. And it is a good feeling.

Yeah now I understood the meaning of ‘Dar k aage jeet h‘, there is win after conquering your fear. I was feeling happy, this feeling that i have not felt before. And it is a moment to remember. After some rest, i went and had the refreshment that they had provided. I was little weak physically but strong mentally.  It was a good experience that taught me a lot. Hope if you faced something like this, you will also come out more stronger and a better person.

I am happy to share the letter of appreciation: