Bitcoind since 0.8 maintains two databases, the block index (in $DATADIR/blocks/index) and the chainstate (in $DATADIR/chainstate). The block index maintains information for every block, and where it is stored on disk. The chain state maintains information about the resulting state of validation as a result of the currently best known chain.
Inside the block index, the used key/value pairs are:
'b' + 32-byte block hash -> block index record. Each record stores:
The block header.
The height.
The number of transactions.
To what extent this block is validated.
In which file, and where in that file, the block data is stored.
In which file, and where in that file, the undo data is stored.
'f' + 4-byte file number -> file information record. Each record stores:
The number of blocks stored in the block file with that number.
The size of the block file with that number ($DATADIR/blocks/blkNNNNN.dat).
The size of the undo file with that number ($DATADIR/blocks/revNNNNN.dat).
The lowest and highest height of blocks stored in the block file with that number.
The lowest and highest timestamp of blocks stored in the block file with that number.
'l' -> 4-byte file number: the last block file number used.
'R' -> 1-byte boolean ('1' if true): whether we're in the process of reindexing.
'F' + 1-byte flag name length + flag name string -> 1 byte boolean ('1' if true, '0' if false): various flags that can be on or off. Currently defined flags include:
'txindex': Whether the transaction index is enabled.
't' + 32-byte transaction hash -> transaction index record. These are optional and only exist if 'txindex' is enabled (see above). Each record stores:
Which block file number the transaction is stored in.
Which offset into that file the block the transaction is part of is stored at.
The offset from the start of that block to the position where that transaction itself is stored.
Inside the chain state database, the following key/value pairs are stored:
'C'+ 32-byte transaction hash + output index length + output index (v0.15 onwards) -> A single unspent transaction output (UTXO) record. Each record contains information about the UTXO at the specified output index of the given transaction. This information consists of:
Whether the transaction was a coinbase or not.
Which height block contains the transaction.
The scriptPubKey and amount for this unspent output.
'c' + 32-byte transaction hash (pre-v0.14) -> unspent transaction output record for that transaction. Unlike 'C', this entry represents all UTXOs from a single transaction. These records are only present for transactions that have at least one unspent output left. Each record stores:
The version of the transaction.
Whether the transaction was a coinbase or not.
Which height block contains the transaction.
Which outputs of that transaction are unspent.
The scriptPubKey and amount for those unspent outputs.
'B' -> 32-byte block hash: the block hash up to which the database represents the unspent transaction outputs.
Latest version of bitcoind(please add version compatibility) uses obfuscation of the value in key/value pair . So you need to XOR with the obfuscation key to get the real value.
I won't go into the specific serialization details of the particular records. They're often specially designed to be compact on disk, and not really intended to be easily usable by other applications (LevelDB doesn't support concurrent access from multiple applications anyway). There are several RPC methods for querying data from the databases (getblock, gettxoutsetinfo, gettxout) without needing direct access.
As you can see, only headers are stored inside this database. The actual blocks and transactions are stored in the block files, which are not databases, but just raw append-only files that contain the blocks in network format.
As to your second question: what is n? If you just want to access some records, sure, iterate over the keys and stop when you've read enough.
import binascii BLOCK_HAVE_DATA = 8 #!< full block available in blk*.dat BLOCK_HAVE_UNDO = 16 #!< undo data available in rev*.dat def encode_varint(number): # * Variable-length integers: bytes are a MSB base-128 encoding of the number. # * The high bit in each byte signifies whether another digit follows. To make # * sure the encoding is one-to-one, one is subtracted from all but the last digit. # * Thus, the byte sequence a[] with length len, where all but the last byte # * has bit 128 set, encodes the number: # * # * (a[len-1] & 0x7F) + sum(i=1..len-1, 128^i*((a[len-i-1] & 0x7F)+1)) # * # * Properties: # * * Very small (0-127: 1 byte, 128-16511: 2 bytes, 16512-2113663: 3 bytes) # * * Every integer has exactly one encoding # * * Encoding does not depend on size of original integer type # * * No redundancy: every (infinite) byte sequence corresponds to a list # * of encoded integers. # * # * 0: [0x00] 256: [0x81 0x00] # * 1: [0x01] 16383: [0xFE 0x7F] # * 127: [0x7F] 16384: [0xFF 0x00] # * 128: [0x80 0x00] 16511: [0x80 0xFF 0x7F] # * 255: [0x80 0x7F] 65535: [0x82 0xFD 0x7F] # * 2^32: [0x8E 0xFE 0xFE 0xFF 0x00] """Encodes a non-negative integer using the MSB base-128 scheme.""" if number < 0: raise ValueError("Only non-negative integers can be encoded.") result = [] while True: byte = number & 0x7F # Extract lower 7 bits number >>= 7 # Shift right by 7 if number: byte |= 0x80 # Set high bit for continuation result.append(byte) if number == 0: break return bytes(result) def decode_varint(stream): """Decodes a variable-length integer from the MSB base-128 format.""" n = 0 while True: chData = ord(stream.get(1)) n = (n << 7) | (chData & 0x7F) if chData & 0x80: n += 1 else: return n def read_int(stream, bits): data = stream.get(bits//8) data.reverse() return binascii.b2a_hex(data) if bits > 64 else int(binascii.b2a_hex(data), 16) class Stream: '''Class to handle byte stream''' def __init__(self, hexdata): self.data = bytearray(bytes.fromhex(hexdata)) self.data.reverse() def get(self, n): result = self.data[:n] self.data = self.data[n:] return result class BlockHeader: def __init__(self, stream): self.nVersion = read_int(stream, 32) self.hashPrev = read_int(stream, 256) self.hashMerkleRoot = read_int(stream, 256) self.nTime = read_int(stream, 32) self.nBits = read_int(stream, 32) self.nNonce = read_int(stream, 32) class VarintCBlockIndex: def __init__(self, stream): self.nVer = decode_varint(stream) self.nHeight = decode_varint(stream) self.nStatus = decode_varint(stream) self.nTx = decode_varint(stream) self.nFile = decode_varint(stream) if self.nStatus & (BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO) else -1 self.nDataPos = decode_varint(stream) if self.nStatus & BLOCK_HAVE_DATA else -1 self.nUndoPos = decode_varint(stream) if self.nStatus & BLOCK_HAVE_UNDO else -1 if __name__ == '__main__': data_hex = '572fe3011b5bede64c91a5338fb300e3fdb6f30a4c67233b997f99fdd518b968b9a3fd65857bfe78b260071900000000001937917bd2caba204bb1aa530ec1de9d0f6736e5d85d96da9c8bba0000000129ffd98136b19a8e00021d00f0833ced8e' # Usage stream = Stream(data_hex) varint_cblockindex = VarintCBlockIndex(stream) block_header = BlockHeader(stream) # print all data from classes: print('varint_cblockindex:') print('\tnVer = ', varint_cblockindex.nVer) print('\tnHeight = ', varint_cblockindex.nHeight) print('\tnStatus = ', varint_cblockindex.nStatus) print('\tnTx = ', varint_cblockindex.nTx) print('\tnFile = ', varint_cblockindex.nFile) print('\tnDataPos = ', varint_cblockindex.nDataPos) print('\tnUndoPos = ', varint_cblockindex.nUndoPos) print('block_header:') print('\tnVersion = ', block_header.nVersion) print('\thashPrev = ', block_header.hashPrev) print('\thashMerkleRoot = ', block_header.hashMerkleRoot) print('\tnTime = ', block_header.nTime) print('\tnBits = ', block_header.nBits) print('\tnNonce = ', block_header.nNonce)
I hope this will help...
Key Structure (b + 32-byte block hash):
The key for each block index record begins with the letter 'b' to distinguish it from other types of entries in the database (e.g., transaction index entries might start with 't').
Following the 'b' is the 32-byte (256-bit) hash of the block. This hash serves as a unique identifier for the block.
The block hash is typically represented in little-endian byte order in the database.
Value Structure (Block Index Record):
Each block index record associated with a specific block hash contains a combination of data:
Block Header:
80 bytes in total Contains the following fields: Version (4 bytes) Previous Block Hash (32 bytes) Merkle Root (32 bytes) Timestamp (4 bytes) Bits (difficulty target, 4 bytes) Nonce (4 bytes)
Height:
A variable-length integer (varint) representing the block's height in the blockchain. Indicates how many blocks precede this block in the chain.
Number of Transactions:
A varint indicating the total number of transactions included in the block.
Validation Status:
A varint representing flags that indicate:
Whether the block's data is fully available (BLOCK_HAVE_DATA)
Whether the block's undo data (for transaction reversal) is available (BLOCK_HAVE_UNDO)
File Location and Position:
If the block's data is available:
A varint indicating the file number (e.g., blkXXXXX.dat) where the block data is stored.
A varint indicating the byte offset within the file where the block data starts.
If the block's undo data is available:
A varint indicating the file number (e.g., revXXXXX.dat) where the undo data is stored.
A varint indicating the byte offset within the file where the undo data starts.
Important Notes:
Varints: Bitcoin uses varints for space efficiency. Smaller numbers are encoded with fewer bytes than larger ones. This makes the size of the block index record variable depending on the block height, transaction count, etc.
Data Availability: Not all block data and undo data might be available in the LevelDB index. The validation status flags indicate whether the data is present and where to find it in the actual block files on disk.
Endianness: The block hash and other fields within the block header are stored in little-endian byte order in LevelDB. This means the least significant byte comes first.
Example (Simplified):
Let's say a block index record looks like this:
Key: b<block_hash> Value: <block_header_bytes><varint_height><varint_tx_count><varint_status><varint_file_num><varint_data_pos><varint_undo_file_num><varint_undo_pos>
This record tells you:
The block hash (block_hash) The block header data (block_header_bytes) The block height (varint_height) The number of transactions (varint_tx_count) The block's validation status (varint_status) Where to find the block data and undo data on disk (if available)