Performance and Storage Comparison Between Pathfinder and Juno

Performance and Storage Comparison Between Pathfinder and Juno

Reddio has set up nodes through Juno and Pathfinder and is providing Etherum/Starknet RPC service at https://www.reddio.com/node .

This article aims to compare the performance and storage usage between these two nodes, to ensure that both testing nodes are synchronized to the latest block, we use a cURL request for confirmation:

Juno:

curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":0,"method":"starknet_blockHashAndNumber"}' http://localhost:6060
{
  "jsonrpc": "2.0",
  "result": {
    "block_hash": "0x8e366ce31c3e4e34b5b5d31fd0564033966721cf93853a77b3a5741aba5188",
    "block_number": 451104
  },
  "id": 0
}

Pathfinder:

curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":0,"method":"starknet_blockHashAndNumber"}' http://localhost:9545
{
  "jsonrpc": "2.0",
  "result": {
    "block_hash": "0x7196ea11a064fc6096f383c171dbb17a27be9870aadedad5b37dc5fd4372602",
    "block_number": 451106
  },
  "id": 0
}

It can be seen that the block_number in both requests differs by only 2, indicating that these two nodes are considered synchronized.

Storage Space

First, let's examine the differences in local storage space between Pathfinder and Juno.

Pathfinder:

total 713G
drwxrwxrwx 2 root   root   4.0K Nov  4 11:26 ./
drwxr-xr-x 6 root   root   4.0K Nov 14 06:07 ../
-rwxrwxrwx 1 root   root   712G Dec  2 04:03 mainnet.sqlite*
-rwxrwxrwx 1 ubuntu ubuntu 1.7M Dec  2 04:03 mainnet.sqlite-shm*
-rwxrwxrwx 1 ubuntu ubuntu 863M Dec  2 04:03 mainnet.sqlite-wal*

Juno:

total 125G
drwxr-xr-x 2 root root  260K Dec  2 04:03 ./
drwxr-xr-x 3 root root  4.0K Dec  2 04:02 ../
-rw-r--r-- 1 root root  8.1M Jul 21 22:41 002540.sst
-rw-r--r-- 1 root root  8.1M Jul 21 22:41 002861.sst
...
-rw-r--r-- 1 root root  3.9M Jul 21 23:05 015018.sst
-rw-r--r-- 1 root root   33M Jul 23 20:03 999203.sst
-rw-r--r-- 1 root root    17 Nov 30 23:40 CURRENT
-rw-r--r-- 1 root root     0 Nov 30 03:58 LOCK
-rw-r--r-- 1 root root  129M Nov 30 23:40 MANIFEST-4682254
-rw-r--r-- 1 root root  110M Dec  2 04:03 MANIFEST-5715011
-rw-r--r-- 1 root root  1.1K Nov 30 03:58 OPTIONS-4682255

It can be observed that there is a significant difference between the two; Pathfinder's storage space usage is 5.7 times that of Juno.

Storage Structure

Pathfinder

Pathfinder uses an SQLite database, and the internal table structure can be viewed using SQLite3:

sqlite> .tables
block_headers                    starknet_events                
canonical_blocks                 starknet_events_keys_03        
casm_compiler_versions           starknet_events_keys_03_config 
casm_definitions                 starknet_events_keys_03_data   
class_commitment_leaves          starknet_events_keys_03_docsize
class_definitions                starknet_events_keys_03_idx    
class_roots                      starknet_transactions          
contract_roots                   starknet_versions              
contract_state_hashes            storage_roots                  
contract_updates                 storage_updates                
l1_state                         trie_class                     
nonce_updates                    trie_contracts                 
refs                             trie_storage

The relationships between the various tables are shown in the following diagram:

The complete schema can be found at Pathfinder GitHub.

Juno

Juno, using the Pebble KV database, does not follow the relational database table structure. Instead, it uses a prefix mechanism for a similar effect. The corresponding prefixes are defined as follows:

const (
	StateTrie         Bucket = iota // state metadata (e.g., the state root)
	Unused                          // Previously held contract storage roots and is now unused. May be reused in the future.
	ContractClassHash               // maps contract addresses and class hashes
	ContractStorage                 // contract storages
	Class                           // maps class hashes to classes
	ContractNonce                   // contract nonce
	ChainHeight                     // Latest height of the blockchain
	BlockHeaderNumbersByHash
	BlockHeadersByNumber
	TransactionBlockNumbersAndIndicesByHash // maps transaction hashes to block number and index
	TransactionsByBlockNumberAndIndex       // maps block number and index to transaction
	ReceiptsByBlockNumberAndIndex           // maps block number and index to transaction receipt
	StateUpdatesByBlockNumber
	ClassesTrie
	ContractStorageHistory
	ContractNonceHistory
	ContractClassHashHistory
	ContractDeploymentHeight
	L1Height
	SchemaVersion
	Pending
	BlockCommitments
	Temporary // used temporarily for migrations
	SchemaIntermediateState
)

The relationships between these prefixes create a structure similar to a relational database. A Key function flattens the prefix and a series of byte arrays into a single []byte for KV operations.

Operational Logic

Using a simple request as an example, fetching the latest synced block:

{
  "jsonrpc": "2.0",
  "id": 0,
  "method": "starknet_blockHashAndNumber"
}

Juno

In Juno, this operation is executed as follows:

// BlockHashAndNumber returns the block hash and number of the latest synced block.
//
// It follows the specification defined here:
// https://github.com/starkware

-libs/starknet-specs/blob/a789ccc3432c57777beceaa53a34a7ae2f25fda0/api/starknet_api_openrpc.json#L517
func (h *Handler) BlockHashAndNumber() (*BlockHashAndNumber, *jsonrpc.Error) {
	block, err := h.bcReader.Head()
	if err != nil {
		return nil, ErrNoBlock
	}
	return &BlockHashAndNumber{Number: block.Number, Hash: block.Hash}, nil
}

The Head function:

func (b *Blockchain) Head() (*core.Block, error) {
	b.listener.OnRead("Head")
	var h *core.Block
	return h, b.database.View(func(txn db.Transaction) error {
		var err error
		h, err = head(txn)
		return err
	})
}

...

func head(txn db.Transaction) (*core.Block, error) {
	height, err := chainHeight(txn)
	if err != nil {
		return nil, err
	}
	return BlockByNumber(txn, height)
}

...

func chainHeight(txn db.Transaction) (uint64, error) {
	var height uint64
	return height, txn.Get(db.ChainHeight.Key(), func(val []byte) error {
		height = binary.BigEndian.Uint64(val)
		return nil
	})
}

...

// BlockByNumber retrieves a block from the database by its number
func BlockByNumber(txn db.Transaction, number uint64) (*core.Block, error) {
	header, err := blockHeaderByNumber(txn, number)
	if err != nil {
		return nil, err
	}

	block := new(core.Block)
	block.Header = header
	block.Transactions, err = transactionsByBlockNumber(txn, number)
	if err != nil {
		return nil, err
	}

	block.Receipts, err = receiptsByBlockNumber(txn, number)
	if err != nil {
		return nil, err
	}
	return block, nil
}

...

// blockHeaderByNumber retrieves a block header from the database by its number
func blockHeaderByNumber(txn db.Transaction, number uint64) (*core.Header, error) {
	numBytes := core.MarshalBlockNumber(number)

	var header *core.Header
	if err := txn.Get(db.BlockHeadersByNumber.Key(numBytes), func(val []byte) error {
		header = new(core.Header)
		return encoder.Unmarshal(val, header)
	}); err != nil {
		return nil, err
	}
	return header, nil
}

The latest block is retrieved using the txn.Get(db.BlockHeadersByNumber.Key(numBytes)) KV operation.

Pathfinder

Pathfinder follows a more traditional approach with database operations. The relevant code is as follows:

tx.block_id(BlockId::Latest)
    .context("Reading latest block hash and number from database")?
    .map(|(block_number, block_hash)| BlockHashAndNumber {
        block_hash,
        block_number,
    })
    .ok_or(BlockNumberError::NoBlocks)

The block_id function:

BlockId::Latest => tx.inner().query_row(
        "SELECT number, hash FROM canonical_blocks ORDER BY number DESC LIMIT 1",
        [],
        |row| {
            let number = row.get_block_number(0)?;
            let hash = row.get_block_hash(1)?;

            Ok((number, hash))
        },
    ),

This block retrieves the latest block number and hash from the canonical_blocks table using a SQLite query.

Performance Comparison

Merely examining the code doesn't provide an intuitive understanding of the performance difference between the two. To evaluate the performance, a script was written to perform the following:

  • Use the starknet_getBlockWithTxs method to randomly request blocks between 0 and the latest block.
  • Send 1000 requests each to Pathfinder and Juno for this method.
  • Pathfinder and Juno use identical machine configurations on Google Cloud: c3-standard-8 with a 2TB SSD persistent disk.

The script:

import requests
import random
import time

url1 = "https://reddio-juno-test.reddio.com"
url2 = "https://reddio-pathfinder-test.reddio.com"
num_requests = 1000

total_time_url1 = 0
total_time_url2 = 0

for _ in range(num_requests):
    # Generate a random block number
    block_number = random.randint(0, 459353)

    # Construct the request payload
    payload = {
        "jsonrpc": "2.0",
        "method": "starknet_getBlockWithTxs",
        "params": [{"block_number": block_number}],
        "id": 0
    }

    # Send the request to the first URL
    start_time = time.time()
    response1 = requests.post(url1, json=payload)
    end_time = time.time()
    total_time_url1 += end_time - start_time

    # Send the request to the second URL
    start_time = time.time()
    response2 = requests.post(url2, json=payload)
    end_time = time.time()
    total_time_url2 += end_time - start_time

# Calculate the average response time
average_time_url1 = total_time_url1 / num_requests
average_time_url2 = total_time_url2 / num_requests

# Output results
print(f"Total time for {num_requests} requests to {url1}: {total_time_url1} seconds")
print(f"Average time per request to {url1}: {average_time_url1} seconds")

print(f"\nTotal time for {num_requests} requests to {url2}: {total_time_url2} seconds")
print(f"Average time per request to {url2}: {average_time_url2} seconds")

Execution results:

Total time for 1000 requests to https://reddio-juno-test.reddio.com: 170.26045179367065 seconds
Average time per request to https://reddio-juno-test.reddio.com: 0.17026045179367066 seconds

Total time for 1000 requests to https://reddio-pathfinder-test.reddio.com: 231.7342541217804 seconds
Average time per request to https://reddio-pathfinder-test.reddio.com: 0.2317342541217804 seconds

Juno's response is 35% faster than Pathfinder for this specific load. However, it's essential to note that this is specific to the starknet_getBlockWithTxs method. The performance advantage might not be as pronounced with other types of loads.

We added an additional test:

  • Use the starknet_getEvents method to randomly request between 2000 and 459353.

At this point, the code looks like this:

import requests
import random
import time

url1 = "https://reddio-juno-test.reddio.com"
url2 = "https://reddio-pathfinder-test.reddio.com"
num_requests = 500

total_time_url1 = 0
total_time_url2 = 0

for _ in range(num_requests):
    # Generate a random block_number
    block_number = random.randint(0, 459353)

    # Construct Payload for starknet_getBlockWithTxs
    payload_getBlock = {
        "jsonrpc": "2.0",
        "method": "starknet_getBlockWithTxs",
        "params": [{"block_number": block_number}],
        "id": 0
    }

    # Send request to the first address for starknet_getBlockWithTxs
    start_time = time.time()
    response1 = requests.post(url1, json=payload_getBlock)
    end_time = time.time()
    total_time_url1 += end_time - start_time

    # Send request to the second address for starknet_getBlockWithTxs
    start_time = time.time()
    response2 = requests.post(url2, json=payload_getBlock)
    end_time = time.time()
    total_time_url2 += end_time - start_time

    # Generate random fromBlock and toBlock for starknet_getEvents
    from_block = random.randint(0, 2000)
    to_block = random.randint(from_block, 459353)

    # Construct Payload for starknet_getEvents
    payload_getEvents = {
        "jsonrpc": "2.0",
        "method": "starknet_getEvents",
        "params": [
            {"fromBlock": from_block, "toBlock": to_block, "page_size": 1000, "page_number": 0, "chunk_size": 10}
        ],
        "id": 1
    }

    # Send request to the first address for starknet_getEvents
    start_time = time.time()
    response1 = requests.post(url1, json=payload_getEvents)
    end_time = time.time()
    total_time_url1 += end_time - start_time

    # Send request to the second address for starknet_getEvents
    start_time = time.time()
    response2 = requests.post(url2, json=payload_getEvents)
    end_time = time.time()
    total_time_url2 += end_time - start_time

# Calculate average response time
average_time_url1 = total_time_url1 / (2 * num_requests)  # Multiply by 2 for the two types of requests
average_time_url2 = total_time_url2 / (2 * num_requests)

# Output results
print(f"Total time for {2 * num_requests} requests to {url1}: {total_time_url1} seconds")
print(f"Average time per request to {url1}: {average_time_url1} seconds")

print(f"\nTotal time for {2 * num_requests} requests to {url2}: {total_time_url2} seconds")
print(f"Average time per request to {url2}: {average_time_url2} seconds")

This time, the results are as follows:

Total time for 1000 requests to https://reddio-juno-test.reddio.com: 102.74015641212463 seconds
Average time per request to https://reddio-juno-test.reddio.com: 0.10274015641212464 seconds

Total time for 1000 requests to https://reddio-pathfinder-test.reddio.com: 108.80669522285461 seconds
Average time per request to https://reddio-pathfinder-test.reddio.com: 0.10880669522285462 seconds

As we can see, in a mixed load scenario involving starknet_getBlockWithTxs and starknet_getEvents, Juno's speed is not as pronounced, only 5% faster than Pathfinder. This also indicates that the performance of the underlying KV may decrease in certain requests. In such cases, if our load balancer can route requests to Juno and Pathfinder based on user requests, we can achieve maximum performance benefits.