IPFS: Publishing To The DHT

November 8, 2022
minute read

As InterPlanetary File System (IPFS) gains more popularity and adoption across the Web3 space, it’s quickly becoming the default storage method for the decentralized web. This increased growth of IPFS can be attributed to the protocol’s simple yet powerful functionality features: data publicity and content addressing.

Like many Web3 tools and infrastructure, understanding the technology’s inner- workings behind the scenes is vital to properly utilizing the tool and making proper decisions regarding your personal use case.

At the fundamental level, IPFS is a decentralized peer-to-peer file storage protocol. It isn’t inherently a decentralized storage network like Filecoin or Sia. IPFS uses two technologies on the backend that set it apart from other decentralized file storage solutions: Directed Acyclic Graphs (DAGs) and Distributed Hash Tables (DHTs).

What is a Directed Acyclic Graph (DAG)?

Directed Acyclic Graphs (DAG) are data structures created when a file is uploaded to IPFS. A graph is a method used to showcase objects and the relationship between them. A directed graph is when a graph’s edges have direction; an acyclic graph is a graph that's edges have definitive ends and do not create a loop to other objects.

IPFS uses a type of DAG known as a Merkle DAG. Merkle DAGs allow for nodes to have multiple parent nodes, providing functionality for chunk de-duplication. De-duplication means that when the duplicated content is stored on the network, it does not need to be stored and transmitted more than once. This helps save on the network's storage and bandwidth resources. Merkle DAGs are agnostic regarding content’s storage location and are not required to be updated when a file is replicated or removed.

What is a Distributed Hash Table (DHT)?

A Distributed Hash Table (DHT) is a method of mapping keys to their associated values. They are fundamentally databases of keys and value pairs that are split across all the peers on a distributed network.

In order to upload or retrieve data stored on IPFS, there must be a mapping of each node’s PeerID and the content identifiers (CIDs) that the peer is storing. The DHT maintains this mapping of each CID with the PeerID of the nodes that are hosting that specific CID.

When a new peer joins the network, it can be classified in the DHT as either a DHT server or a DHT client. All of the DHT servers have public IP addresses that can be reached from the public Internet. In comparison, all DHT clients have private IP addresses that use protocols like Network Address Translation (NAT) to protect the privacy of the node.

When a new peer joins the network, it is a DHT client by default. This node then initiates connections with other peers on the network to begin participating. Once three or more peer nodes connect to a newly joined peer, the peer upgrades to act as a DHT server. DHT servers are capable of performing all functions on the network, like storing content or maintaining and serving CIDs and their associated PeerIDs when requested.

In comparison, DHT clients are only capable of requesting content from the network and do not store or provide content to other peers. This configuration is to avoid publicly unreachable peers being part of another peer’s routing table, which can cause issues within the file publication and retrieval process.

The flowchart below showcases how the IPFS content retrieval process works.

Why Does Publishing to the DHT Matter?

When content is uploaded to IPFS, its CID isn’t automatically published to the DHT servers. If you’re pinning content to IPFS through a personal IPFS node or an IPFS pinning service other than Filebase, your content isn’t benefiting from the best possible performance and content retrieval times.

As shown by the flowchart featured previously, if the CID isn’t found in the DHT, the request enters an idle looping state where it searches for CID providers consistently until a provider is found. Otherwise, the request times out, and the content isn’t retrieved. This is because some pinning service providers don’t publish their pinning provider records to the IPFS DHT servers.

By not publishing these records, the content retrieval process can take up to a minute, which is much longer compared to just a few seconds for content pinned to IPFS through Filebase. This is because Filebase publishes all of our provider records to IPFS DHT servers for the best performance and fastest retrieval time possible when a CID is requested.

Filebase: Geo-redundant IPFS Pinning

When content is uploaded to an IPFS bucket on Filebase, its CID is published to the IPFS DHT servers, along with being pinned with 3 copies stored across 3 unique geographic locations in the United States, London, and Frankfurt.

Through a combination of DHT publication and geo-redundant replication, CIDs hosted through Filebase are able to be retrieved at a fraction of the time compared to other IPFS pinning providers.

You can sign up for a free Filebase account to get started with your IPFS journey today.

If you have any questions, please join our Discord server, or send us an email at hello@filebase.com.