IPFS: Content Addressing Explained

November 8, 2022
4
minute read

InterPlanetary File System, or IPFS, is most well known for its infrastructure that uses content addressing for referencing content stored on the network. Content addressing means that all content stored on the network is referenced through a unique value known as the data’s content identifier or CID.

How does this differ from other ways to access content like through location addressing, and why is content addressing so beneficial?

Location Addressing

When a webpage is stored on a web server, it typically uses the HTTPS protocol to retrieve and serve that webpage through a web browser for you to view and interact with. This webpage is stored on a web server using location addressing.

Location addressing refers to when content stored on a server is accessed through the content’s relative file path, which refers to the file’s location on that server’s hierarchy structure.

For example, the following webpage is served using location addressing, where you’re viewing the IPFS Pinning Service API webpage that is located in the API Documentation folder:

https://docs.filebase.com/api-documentation/ipfs-pinning-service-api

The location portion of the URL, api-documentation/ipfs-pinning-service-api, is known as the URL’s slug. If this slug changes or is moved, the link will no longer work and instead will return an error page, such as a 404 Not Found message.

This is referred to as ‘link rot’. It is one of the main disadvantages of using location addressing forms of data access. Any changes to website structures or URL structures can cause entire web pages to be renamed and need to be accessed through a new URL.

Content Addressing

Content identifiers, often referred to as CIDs, are unique strings of characters that are associated with a piece of data stored on platforms that use content addressing, like IPFS. In comparison to location addressing, content addressing refers to what the stored data contains, versus where the content is located on the web server.

What Are CIDs?

When a file is uploaded to IPFS, the file’s contents are used to generate a cryptographic hash value. Then, this hash value is used to generate another value, which is used as the file’s content identifier (CID).

With all IPFS CIDs, two things will always be true:

  • If the same file or folder is added to two separate IPFS nodes using the same settings, metadata, and parameters it will always produce the same CID value. Any difference in the content at all, such as metadata differences, will produce an entirely different CID.
  • The length of an IPFS CID will always be the same for all data files, regardless of the file’s size or content.

Each CID is generated based on the content’s cryptographic hash value, which can be broken down into 4 pieces, which are:

  • Multibase Prefix: indicates which base-encoding binary was used when generating the CID.
  • CID-Version Identifier: indicates which version of IPFS CID (v0 or v1) has been used.
  • Multicodec Identifier: indicates the manner in which the data file was encoded.
  • Multihash: a self-describing value used for indexing the data file’s metadata and refers to the type of hashing algorithm used.

By default, IPFS uses the SHA2-256 algorithm with a length of 32 bytes to generate the CID value for files uploaded to the network, though IPFS supports using any hashing algorithm.

There are currently two versions of CIDs, Version 0 (CIDv0) and Version 1 (CIDv1).  CIDv0 CIDs use a base 58-encoded multi-hash value for all content identifiers. This version of CIDs is more versatile than the newest version, CIDv1. It is used as the default CID version for many IPFS workflows. All CIDv0 CIDs begin with the characters ‘Qm’.

CIDv1 CIDs are the newest version of IPFS CIDs and use a multi-base prefix which indicates the encoding method that is used for the remainder of the CID. CIDv1 values also use a multi-codec identifier that specifies what format the content is stored in. Through these two attributes, the first few bytes of a CIDv1 can be used to decode the CID once it has been retrieved from the IPFS network.

Benefits of Content Addressing

Content addressing is fundamental Web3 because it prevents traditional restrictions such as vendor lock-in, centralized administration, or single points of failure. It also provides other benefits, such as:

  • Content Addressing Isn’t Affected By Link Rot: Since content addressing doesn’t rely on file paths to remain the same or domains to stay online, link rot isn’t a concern for data that uses content addressing. As long as a CID is available on the IPFS network, the content can be accessed.
  • Immutability: Data files stored on IPFS are immutable in the sense that any change to the file’s contents or metadata results in a new, unique CID. This form of immutability shouldn’t be confused with permanent data storage immutability, since files stored on IPFS can be deleted and removed from the network.

IPFS Pinning

To assure that CIDs are always available, they must be pinned to an IPFS node on the network.

IPFS pinning refers to the process of storing a file or folder within an IPFS node’s permanent storage instead of in the node’s cache storage. Unless a file is pinned, it’s stored in cache storage that is periodically cleared by the network’s garbage collection process.

You can sign up for a free Filebase account to get started with your IPFS journey today.

If you have any questions, please join our Discord server, or send us an email at hello@filebase.com.