IPFS: Directed Acyclic Graphs Explained

November 8, 2022
minute read

InterPlanetary File System (IPFS) is a decentralized peer-to-peer storage protocol that uses two vital technologies on the backend to facilitate the decentralized storage solution that is gaining extreme popularity and adoption throughout the Web3 ecosystem. These two technologies are Directed Acyclic Graphs (DAGs) and Distributed Hash Tables.

These two technologies provide the foundation for IPFS to provide content storage that uses content linking and content addressing. Through content addressing, IPFS provides an alternative solution to traditional data addressing, which typically uses location addressing, such as HTTPS URLs, where each piece of content has a unique URL link that is highly susceptible to link rot. Link rot is when a URL associated with a specific piece of content changes due to the renaming of an article, folder, or other assets on the domain.

Link rot is the leading cause of broken links on the web. In comparison, content addressing is not susceptible to link rot. Instead of locating content based on its URL on a web server, it’s located based on the file’s specific content address ID, or CID value. CIDs are generated based on the cryptographic value generated by the file’s content, and any change to the file’s content, data, or metadata will result in a new, unique CID value.

Let’s dive into Directed Acyclic Graphs, and further understand their importance in the inter-workings of IPFS.

Directed Acyclic Graphs

At the most fundamental level, directed acyclic graphs (DAGs) are a hierarchical data structure.

To understand Directed Acyclic Graphs, let’s break down each portion of the word directly.

By definition, a graph is a way to display objects and the relationship between them. Furthermore, a directed graph is when a graph’s edges have direction, as showcased in the photo below. An acyclic graph is a graph where the edges have definitive ends and do not create a loop to other objects. In the context of IPFS, an object in a DAG is referred to as a node and an edge refers to the relation between the objects in a graph.

One way to visualize a DAG is to imagine a family tree that shows ancestors and their relationship to one another. Each relationship has a direction downward from one ancestor to their descendants, and each lineage line has a definitive end.

Another way to visualize DAGs are to visualize how content is stored in folders and subfolders on your local computer. You might have a Photos folder that contains subfolders of animal photos, with another subfolder containing personal family photos. Each of these subfolders contains a different set of image files.

IPFS uses a specific kind of DAGs called Merkle DAGs. Merkle Directed Acyclic Graphs are a type of directed acyclic graph that is created when a node’s contents are hashed using the unique payload carried by the node and the list of content that it currently stores. Merkle DAGs are a form of self-verifying data structures, which means that the CID of an IPFS node is permanently linked to the contents of that node’s payload and all of that node’s descendants.

Merkle DAGs are created when a file is uploaded to IPFS and is automatically divided into 256kB pieces. Each chunk’s is given a unique CID that is then combined by the root node into a single CID known as the root CID. The root CID is the CID that end-users are provided once the file has been uploaded to IPFS.

A unique feature of Merkle DAGs is that they allow for nodes to have multiple parent nodes, allowing for chunk de-duplication. De-duplication is when identical content is stored on the network, it does not need to be stored and transmitted more than once, which helps save on both storage and bandwidth resources on the network.

Some important things to note when learning about Merkle DAGs are:

  • Merkle DAG nodes are immutable, which means that any change in the node or its stored content will alter the node’s identifier and have an effect on all the ascents of that node in the DAG, which in return creates a brand new DAG.
  • Each node in a Merkle DAG is part of a subgraph containing other DAGs and the parent DAG within the network’s overarching data structure.
  • Merkle DAGs are specifically created from leaves, which means that they are created from nodes without children nodes. Parent nodes are added after children nodes due to the fact that a children node’s identifier must be computed in advance in order for the identifier of that children node to be linked to the parent node.

Filebase: Geo-redundant IPFS Pinning

Filebase is a geo-redundant IPFS pinning service, where all files uploaded to an IPFS bucket are pinned with 3 copies stored across 3 unique geographic locations in the United States, London, and Frankfurt.

You can sign up for a free Filebase account to get started with your IPFS journey today.

If you have any questions, please join our Discord server, or send us an email at hello@filebase.com.