Magic Bitboards and PEXT

The Problem: Sliding Piece Attacks

Why Sliding Pieces Are Hard

Non-sliding pieces (knight, king, pawn):

Fixed attack pattern regardless of board state
Can pre-compute all attacks at startup
Simple lookup: StepAttacksBB[piece][square]

Sliding pieces (rook, bishop, queen):

Attack pattern depends on blocking pieces
Can’t pre-compute all possibilities (too many combinations)

The Challenge

Rook on e4 - different scenarios:

Scenario 1: Empty board
  8  . . . . X . . .
  7  . . . . X . . .
  6  . . . . X . . .
  5  . . . . X . . .
  4  X X X X R X X X  ← Attacks entire rank and file
  3  . . . . X . . .
  2  . . . . X . . .
  1  . . . . X . . .
     a b c d e f g h

Scenario 2: Blocked by pieces
  8  . . . . . . . .
  7  . . . . . . . .
  6  . . . . X . . .
  5  . . . . X . . .
  4  . . X X R X . .  ← Blocked at c4 and f4
  3  . . . . X . . .
  2  . . . . . . . .
  1  . . . . . . . .
     a b c d e f g h
     (Pieces at: c4, e2, e6, f4)

The question: How do we efficiently compute attacks for ANY occupancy pattern?

Naive Solutions (Too Slow)

Solution 1: Compute On-Demand

Bitboard rook_attacks(Square sq, Bitboard occupied) {
    Bitboard attacks = 0;
    
    // North
    for (Square s = sq + 8; s <= SQ_H8; s += 8) {
        attacks |= s;
        if (occupied & s) break;  // Hit a piece, stop
    }
    
    // South
    for (Square s = sq - 8; s >= SQ_A1; s -= 8) {
        attacks |= s;
        if (occupied & s) break;
    }
    
    // East, West (similar)
    ...
    
    return attacks;
}

Cost: 4 loops, up to 7 iterations each = ~20-30 cycles Called: Millions of times per second Too slow!

Solution 2: Pre-compute Everything

// For EACH square (64) and EACH possible occupancy pattern
Bitboard RookAttacks[64][2^64];  // Store all possibilities

Problem:

64 squares × 2^64 occupancies = impossibly large (10^19 entries!)
Would need exabytes of RAM

The Insight: Most Bits Don’t Matter

Relevant Occupancy

For a rook on e4, only these squares matter:

  8  . . . . . . . .
  7  . . . . X . . .  ← Only e7 matters (not e8 - edge)
  6  . . . . X . . .
  5  . . . . X . . .
  4  . X X X R X X .  ← Only b4-d4 and f4-g4 (not a4, h4 - edges)
  3  . . . . X . . .
  2  . . . . X . . .
  1  . . . . . . . .  ← Only e2 matters (not e1 - edge)
     a b c d e f g h

Relevant squares: {b4, c4, d4, f4, g4, e2, e3, e5, e6, e7} = 10 squares

Similarly for a rook in a1

For a rook on e4, only these squares matter:

  8  . . . . . . . .
  7  X . . . . . . . 
  6  X . . . . . . .
  5  X . . . . . . .
  4  X . . . . . . .  
  3  X . . . . . . .
  2  X . . . . . . .
  1  R X X X X X X .  
     a b c d e f g h

Relevant squares: {a2, a3, a4, a5, a6, a7, b1, c1, d1, e1, f1, g1} = 12 squares

Why ignore edges?

Attacks always stop at or go past the edge
Occupancy of edge squares doesn’t change the attack pattern

This is the maximum number of squares we would ever need for a rook.

The Next Idea: Perfect Hashing

The Problem We Still Have

Even with only relevant squares, we still need efficient lookup:

Rook on e4: 10 relevant squares → 2^10 = 1,024 possible occupancy patterns
Rook on a1: 12 relevant squares → 2^12 = 4,096 possible occupancy patterns

We could create a table like:

Bitboard RookAttacks[64][4096];  // Much better than 2^64!

Output would be a bitboard containing all the squares reachable by the piece.

Let’s take a step back and understand what is the 4096 here, its 2^12. At worst, we might need to represent 12 squares whose occupancy affects rook reachability. Since each square can be occupied or empty, we take 2^12.

This means the second index is an encoded way to tell which of these relevant squares are occupied by any piece (blockers),

Concrete Example: Rook on a1 (12 relevant squares)

  8  . . . . . . . .
  7  X . . . . . . . ← a7 (relevant)
  6  X . . . . . . . ← a6 (relevant)
  5  X . . . . . . . ← a5 (relevant)
  4  X . . . . . . . ← a4 (relevant)
  3  X . . . . . . . ← a3 (relevant)
  2  X . . . . . . . ← a2 (relevant)
  1  R X X X X X X . ← b1,c1,d1,e1,f1,g1 (relevant)
     a b c d e f g h

Relevant squares: {a2, a3, a4, a5, a6, a7, b1, c1, d1, e1, f1, g1} = 12 squares

All Possible Occupancy Patterns

Pattern 0: All 12 squares empty

Occupancy: [0,0,0,0,0,0,0,0,0,0,0,0] → Attacks go to edges

Pattern 1: Only a2 occupied

Occupancy: [1,0,0,0,0,0,0,0,0,0,0,0] → Attacks blocked at a2 vertically

Pattern 2: Only a3 occupied

Occupancy: [0,1,0,0,0,0,0,0,0,0,0,0] → Attacks blocked at a3 vertically

Pattern 3: Both a2 and a3 occupied

Occupancy: [1,1,0,0,0,0,0,0,0,0,0,0] → Attacks blocked at a2 (closer blocker)

… continuing through …

Pattern 4095: All 12 squares occupied

Occupancy: [1,1,1,1,1,1,1,1,1,1,1,1] → Attacks only to a2 and b1

Why We Need All Combinations

Each combination produces a different attack pattern:

// Example: Different occupancies → Different attacks

// Empty board
Occupancy: 000000000000 → Attacks: a2-a7, b1-g1

// Piece at a4 only  
Occupancy: 000100000000 → Attacks: a2-a4, b1-g1

// Piece at c1 only
Occupancy: 000000010000 → Attacks: a2-a7, b1-c1

// Pieces at both a4 AND c1
Occupancy: 000100010000 → Attacks: a2-a4, b1-c1 (different from either alone!)

The Table Structure

So we build:

// For rook on a1 with 12 relevant squares:
Bitboard RookAttacks_a1[4096];  // One entry for each occupancy pattern

RookAttacks_a1[0]    = attacks when all relevant squares empty
RookAttacks_a1[1]    = attacks when only a2 occupied
RookAttacks_a1[2]    = attacks when only a3 occupied
RookAttacks_a1[3]    = attacks when a2 and a3 occupied
...
RookAttacks_a1[4095] = attacks when all 12 squares occupied

But in order to generate the second index, we must encode the information of pieces present on same rank and file into a integer.

Naive Approach

A naive way of building the second index would be to loop through the corresponding file and column and set the bits, but loops defeat the whole purpose here.

For a rook on a1

// Extract relevant bits manually
int index = 0;
if (occupied & (1ULL << 1))  index |= (1 << 0);   // Check a2
if (occupied & (1ULL << 2))  index |= (1 << 1);   // Check b1
if (occupied & (1ULL << 3))  index |= (1 << 2);   // Check c1
if (occupied & (1ULL << 4))  index |= (1 << 3);   // Check d1
if (occupied & (1ULL << 5))  index |= (1 << 4);   // Check e1
if (occupied & (1ULL << 6))  index |= (1 << 5);   // Check f1
if (occupied & (1ULL << 8))  index |= (1 << 6);   // Check a3
if (occupied & (1ULL << 16)) index |= (1 << 7);   // Check a4
// ... 12 checks total!

// Finally lookup:
Bitboard attacks = RookAttacks_a1[index];

Cost: 12 conditional checks, 12 bit operations = ~30-40 CPU cycles

The Magic Bitboard Solution

The Mask

In order to quickly extract only the pieces on relevant rank and file, we can use pre-computed masks.

For eg: for a rook on a1, the mask will be

  8  . . . . . . . .
  7  1 . . . . . . . ← a7 (relevant)
  6  1 . . . . . . . ← a6 (relevant)
  5  1 . . . . . . . ← a5 (relevant)
  4  1 . . . . . . . ← a4 (relevant)
  3  1 . . . . . . . ← a3 (relevant)
  2  1 . . . . . . . ← a2 (relevant)
  1  0 1 1 1 1 1 1 . ← b1,c1,d1,e1,f1,g1 (relevant)
     a b c d e f g h

In hex it can be written as

Bitboard mask   = 0x0001010101010126;  // Relevant squares

Similarly there are 64 squares, each square will have a mask for each type of sliding piece.

The Mapping Problem

Once we have the mask, we can extract only the pieces on relevant rank/file/diagonal by using pre-computed mask.

occupied & mask will give us the bitboard of only relevant pieces.

Now for rook on a1, there can be 4096 such bitboards, we need a way to map each of them uniquely to a number between 0 and 4095.

It doesn’t matter which scenario corresponds to what number.

For eg: for rook on a1 scenario:

index 0 could be the case where there are no blockers
index 1 could be the case where there is a blocker on c1
index 2 could be the case where there is a blocker on a6, a3.
…
index 4095 could be the case where are blockers on d1, g1, a4, a6

We need a function which consumes occupied & mask and outputs a unique index between 0 and 4095. Since the input space is finite and bounded, ensuring collision free output space is possible.

How multiplication helps

The key insight: A single multiplication can rearrange bits!

When you multiply two numbers, bits from the multiplicand appear at different positions in the result:

Example:
  x = 0b00001010  (bits at positions 1 and 3)
  m = 0b00000011  (magic multiplier)
  
  x * m = ?
  
In binary multiplication:
      00001010
    × 00000011
    ──────────
      00001010  (x × 1)
     00001010   (x × 2, shifted left 1)
    ──────────
     000011110

Notice: The bits from x appeared at new positions in the result!

Multiplication causes the relevant occupancy bits to interact and spread into the upper bits. With the right magic constant, every different occupancy produces a unique high-bit pattern.

The Magic Bitboard

For each scenario, eg: rook on a1, we choose a seemingly random magic bitboard

Bitboard magic  = 0x0080001020400080;

Magic bitboards work like a hash function:

hash(occupancy_pattern) = ((occupancy & mask) * magic) >> shift

Requirements:

Different patterns → Different indices (no collisions)
Same pattern → Same index (deterministic)
Index range: 0 to 2^N - 1 (where N = number of relevant squares)

For our case of rook on a1, N = 12

Bitboard magic  = 0x0080001020400080;
hash(occupancy_pattern) = ((occupancy & mask) * magic) >> 52

Reason for right shifting by 52 is:

Right shifting by 52 ensures the range is within 0 to 4095 because 2^64 / 2^52 = 2^12.
After multiplication, the useful information ends up in the top bits
We discard the lower noisy bits

The magic mask is not randomly selected, it is carefully found via search such that it scatters the patterns so that there is no collision in the end result of 0 to 4095.

Each square will have its own unique magic mask for each type of sliding piece.

At runtime, Stockfish computes rook attacks in 3 steps:

Extract only relevant blockers:

    occ = occupied & mask

Compress this occupancy into an index:

    index = (occ * magic) >> (64 - N)

Lookup the precomputed attack bitboard:

    attacks = RookAttacks[square][index]

The magic number is chosen offline so that all 2^N occupancies map to unique indices (no collisions).

How are magic numbers calculated?

Magic numbers are found offline using brute force search with heuristics, and then hardcoded into Stockfish.

There is no known direct mathematical formula
They are computed once (during development or build tooling)
The engine does not search for them at runtime

Stockfish just ships with a precomputed array like:

Magic RookMagics[64];
Magic BishopMagics[64];

But the search is not purely random, there are some heuristics which help is to reduce the search space. For eg: for rook we need only higher 12 bits, so first 52 btits of mask don’t contribute anything and they can all be 0. Brute forcing 12 bit number is a solvable problem.

Code Walkthrough

1. The Magic Bitboard Data Structures

Bitboard  RookMasks  [SQUARE_NB];
Bitboard  RookMagics [SQUARE_NB];
Bitboard* RookAttacks[SQUARE_NB];
unsigned  RookShifts [SQUARE_NB];

Bitboard  BishopMasks  [SQUARE_NB];
Bitboard  BishopMagics [SQUARE_NB];
Bitboard* BishopAttacks[SQUARE_NB];
unsigned  BishopShifts [SQUARE_NB];

These arrays store the pre-computed magic bitboard data for all 64 squares, separately for rooks and bishops.

1. RookMasks[SQUARE_NB]

Type: Array of 64 bitboards
Purpose: Stores the relevant occupancy mask for each square
Content: For square sq, RookMasks[sq] is a bitboard with 1’s at all relevant squares (excluding edges)

2. RookMagics[SQUARE_NB]

Type: Array of 64 bitboards (used as 64-bit magic numbers)
Purpose: Stores the magic constant for each square
Content: For square sq, RookMagics[sq] is the magic number found through search that creates a perfect hash

3. RookAttacks[SQUARE_NB]

Type: Array of 64 pointers to bitboards
Purpose: Each pointer points to the attack lookup table for that square
Content:
- RookAttacks[sq] is a pointer to an array of attack bitboards
- The array size depends on the number of relevant bits for that square

Memory layout:

// Conceptually:
RookAttacks[A1]  → points to array of 4096 bitboards (2^12 for corner)
RookAttacks[E4]  → points to array of 1024 bitboards (2^10 for center)
RookAttacks[A8]  → points to array of 4096 bitboards (2^12 for corner)
// etc.

// Example of what RookAttacks[A1] points to:
Bitboard rook_a1_table[4096] = {
    0x01010101010101FE,  // [0]    All relevant squares empty
    0x000000000000017E,  // [1]    Blocked by piece at a2
    0x0101010101010102,  // [2]    Blocked by piece at a3
    0x0000000000000102,  // [3]    Blocked by pieces at a2 and a3
    // ... 4092 more entries
};

RookAttacks[A1] = rook_a1_table;  // Pointer assignment

Usage:

int index = (relevant_occupancy * RookMagics[sq]) >> shift;
Bitboard attacks = RookAttacks[sq][index];

Its pointer instead of hardcoding 4096, because only corner squares need 4096, center squares need only 1024.

Same logic is applied for bishops.

4. RookShifts[SQUARE_NB]

It stores the amount of shift required for each square as per this formula

int index = (relevant_occupancy * RookMagics[sq]) >> shift;

for corner square its 4096, for center its 1024, …

2. magic_index

/// attacks_bb() returns a bitboard representing all the squares attacked by a
/// piece of type Pt (bishop or rook) placed on 's'. The helper magic_index()
/// looks up the index using the 'magic bitboards' approach.
template<PieceType Pt>
inline unsigned magic_index(Square s, Bitboard occupied) {

  extern Bitboard RookMasks[SQUARE_NB];
  extern Bitboard RookMagics[SQUARE_NB];
  extern unsigned RookShifts[SQUARE_NB];
  extern Bitboard BishopMasks[SQUARE_NB];
  extern Bitboard BishopMagics[SQUARE_NB];
  extern unsigned BishopShifts[SQUARE_NB];

  Bitboard* const Masks  = Pt == ROOK ? RookMasks  : BishopMasks;
  Bitboard* const Magics = Pt == ROOK ? RookMagics : BishopMagics;
  unsigned* const Shifts = Pt == ROOK ? RookShifts : BishopShifts;

  if (HasPext)
      return unsigned(pext(occupied, Masks[s]));

  if (Is64Bit)
      return unsigned(((occupied & Masks[s]) * Magics[s]) >> Shifts[s]);

  unsigned lo = unsigned(occupied) & unsigned(Masks[s]);
  unsigned hi = unsigned(occupied >> 32) & unsigned(Masks[s] >> 32);
  return (lo * unsigned(Magics[s]) ^ hi * unsigned(Magics[s] >> 32)) >> Shifts[s];
}

This is a sophisticated, optimized version of the basic magic index calculation.

int index = (relevant_occupancy * RookMagics[sq]) >> shift;

Template parameter Pt: Either ROOK or BISHOP

Allows one function to work for both piece types
Compiler generates two versions at compile time

Step 1: Compile time branching selects the corresponding masks, magics and shifts.

What’s happening:

Uses compile-time selection to pick rook or bishop arrays
Masks, Magics, Shifts are pointers to the appropriate arrays
Since Pt is a template parameter, this is resolved at compile time (zero runtime cost!)

Step 2: Three Different Implementations

The function has three code paths for different CPU capabilities:

Path 1: PEXT Instruction (Modern CPUs)

if (HasPext)
    return unsigned(pext(occupied, Masks[s]));

PEXT (Parallel Bits Extract): A single CPU instruction (BMI2 instruction set) that does exactly what we need!

pext(source, mask) → extracts bits where mask=1, compacts them

Take some bits from a number (source), but only at positions where mask has 1s, and pack them tightly into the low bits.

In magic bitboards, we want this:

occ = occupied & mask;
index = encode(occ);

Where encode() means:

Take the bits on relevant squares
Convert them into a compact number from 0 .. 2^N - 1

That encoding step is exactly what PEXT does instantly.

Mask tells which squares matter

Example: rook on A1 has 12 relevant blocker squares:

a2 a3 a4 a5 a6 a7  b1 c1 d1 e1 f1 g1

So the mask has 1s in those bit positions.

What PEXT does

pext(occupied, mask)

It does:

Look at every bit where mask has a 1
Copy that bit from occupied
Pack them into a small integer

Example

bit:       7 6 5 4 3 2 1 0
occupied = 1 0 1 1 0 1 0 1
mask     = 0 1 1 0 1 0 0 0

Mask selects bits:

bit 6
bit 5
bit 3

bit6 = 0
bit5 = 1
bit3 = 1

Now PEXT packs them into low bits:

result = 0b011

pext(occupied, mask) = 3

We can clearly see why this generates a unique number between 0 to 2^N - 1 (4096 for rook on a1). Because each mask is unique, depending on occupancy some bits of mask might be turned to 0, but the result will always be unique.

This entirely skips the use of magics.

This is the fastest method!

Path 2: 64-bit Magic Bitboards (Standard Case)

if (Is64Bit)
    return unsigned(((occupied & Masks[s]) * Magics[s]) >> Shifts[s]);

This is our familiar version:

int index = (relevant_occupancy * RookMagics[sq]) >> shift;

When used: On 64-bit CPUs without PEXT (most common case until ~2013)

Path 3: 32-bit Magic Bitboards (Legacy)

unsigned lo = unsigned(occupied) & unsigned(Masks[s]);
unsigned hi = unsigned(occupied >> 32) & unsigned(Masks[s] >> 32);
return (lo * unsigned(Magics[s]) ^ hi * unsigned(Magics[s] >> 32)) >> Shifts[s];

Why this is needed: On 32-bit CPUs, multiplying two 64-bit numbers is expensive!

The trick: Split 64-bit numbers into two 32-bit halves

occupied (64-bit) = [hi 32 bits | lo 32 bits]
Masks[s] (64-bit) = [hi 32 bits | lo 32 bits]

Step 1: Extract relevant bits in each half
  lo = (occupied & 0xFFFFFFFF) & (Masks[s] & 0xFFFFFFFF)
  hi = (occupied >> 32) & (Masks[s] >> 32)

Step 2: Hash each half separately
  lo_hash = lo * (lower 32 bits of Magics[s])
  hi_hash = hi * (upper 32 bits of Magics[s])

Step 3: Combine with XOR and shift
  index = (lo_hash ^ hi_hash) >> Shifts[s]

Why XOR? It mixes the two 32-bit hashes into one hash value

When used: On 32-bit CPUs (rare nowadays, but important for embedded systems)

The underlying tables are different for PEXT and magic bitboard version, so we don’t need to worry about producing same index in both versions.

But this also means the type of implementation is decided at compile time, a binary compiled with PEXT implementation can’t be used on 64 bit CPU without BMI2.

3. init_magics

 // init_magics() computes all rook and bishop attacks at startup. Magic
  // bitboards are used to look up attacks of sliding pieces. As a reference see
  // chessprogramming.wikispaces.com/Magic+Bitboards. In particular, here we
  // use the so called "fancy" approach.

  void init_magics(Bitboard table[], Bitboard* attacks[], Bitboard magics[],
                   Bitboard masks[], unsigned shifts[], Square deltas[], Fn index) {

    int seeds[][RANK_NB] = { { 8977, 44560, 54343, 38998,  5731, 95205, 104912, 17020 },
                             {  728, 10316, 55013, 32803, 12281, 15100,  16645,   255 } };

    Bitboard occupancy[4096], reference[4096], edges, b;
    int age[4096] = {0}, current = 0, i, size;

    // attacks[s] is a pointer to the beginning of the attacks table for square 's'
    attacks[SQ_A1] = table;

    for (Square s = SQ_A1; s <= SQ_H8; ++s)
    {
        // Board edges are not considered in the relevant occupancies
        edges = ((Rank1BB | Rank8BB) & ~rank_bb(s)) | ((FileABB | FileHBB) & ~file_bb(s));

        // Given a square 's', the mask is the bitboard of sliding attacks from
        // 's' computed on an empty board. The index must be big enough to contain
        // all the attacks for each possible subset of the mask and so is 2 power
        // the number of 1s of the mask. Hence we deduce the size of the shift to
        // apply to the 64 or 32 bits word to get the index.
        masks[s]  = sliding_attack(deltas, s, 0) & ~edges;
        shifts[s] = (Is64Bit ? 64 : 32) - popcount(masks[s]);

        // Use Carry-Rippler trick to enumerate all subsets of masks[s] and
        // store the corresponding sliding attack bitboard in reference[].
        b = size = 0;
        do {
            occupancy[size] = b;
            reference[size] = sliding_attack(deltas, s, b);

            if (HasPext)
                attacks[s][pext(b, masks[s])] = reference[size];

            size++;
            b = (b - masks[s]) & masks[s];
        } while (b);

        // Set the offset for the table of the next square. We have individual
        // table sizes for each square with "Fancy Magic Bitboards".
        if (s < SQ_H8)
            attacks[s + 1] = attacks[s] + size;

        if (HasPext)
            continue;

        PRNG rng(seeds[Is64Bit][rank_of(s)]);

        // Find a magic for square 's' picking up an (almost) random number
        // until we find the one that passes the verification test.
        do {
            do
                magics[s] = rng.sparse_rand<Bitboard>();
            while (popcount((magics[s] * masks[s]) >> 56) < 6);

            // A good magic must map every possible occupancy to an index that
            // looks up the correct sliding attack in the attacks[s] database.
            // Note that we build up the database for square 's' as a side
            // effect of verifying the magic.
            for (++current, i = 0; i < size; ++i)
            {
                unsigned idx = index(s, occupancy[i]);

                if (age[idx] < current)
                {
                    age[idx] = current;
                    attacks[s][idx] = reference[i];
                }
                else if (attacks[s][idx] != reference[i])
                    break;
            }
        } while (i < size);
    }
  }

Parameters:

table[] - Large pre-allocated array to hold all attack bitboards
attacks[] - Array of 64 pointers, will point into table[]
magics[] - Will be filled with magic numbers for each square
masks[] - Will be filled with relevant occupancy masks
shifts[] - Will be filled with shift amounts
deltas[] - Movement directions (e.g., {+8, -8, +1, -1} for rook)
index - Function pointer to calculate index (64-bit or 32-bit version)

Called twice at startup:

init_magics(RookTable, RookAttacks, RookMagics, RookMasks, RookShifts, 
            RookDeltas, magic_index<ROOK>);
            
init_magics(BishopTable, BishopAttacks, BishopMagics, BishopMasks, BishopShifts,
            BishopDeltas, magic_index<BISHOP>);

Part 1: Random Seeds

int seeds[][RANK_NB] = { 
    { 8977, 44560, 54343, 38998,  5731, 95205, 104912, 17020 },  // 64-bit
    {  728, 10316, 55013, 32803, 12281, 15100,  16645,   255 }   // 32-bit
};

Purpose: Seeds for random number generator, one per rank Why different seeds per rank?

Squares on different ranks have different numbers of relevant bits
Different seeds help find magic numbers faster
These specific values were found empirically to work well

attacks[SQ_A1] = table;

for (Square s = SQ_A1; s <= SQ_H8; ++s) {
    // ... (work for square s)
    
    if (s < SQ_H8)
        attacks[s + 1] = attacks[s] + size;
}

What’s happening: Building variable-sized tables in one contiguous array

table[] (one big array):
┌────────────────────────────────────────────────────┐
│ A1 attacks | A2 attacks | A3 attacks | ... | H8    │
│ (4096)     | (2048)     | (2048)     |     | (4096)│
└────────────────────────────────────────────────────┘
 ↑            ↑            ↑                  ↑
 attacks[A1]  attacks[A2]  attacks[A3]       attacks[H8]

Each square gets a pointer to its portion of the table:

attacks[A1] = &table[0];
attacks[A2] = &table[4096];      // A1 used 4096 entries
attacks[A3] = &table[4096+2048]; // A2 used 2048 entries
// etc.

This is the “Fancy Magic Bitboards” approach - variable-sized tables to save memory!

Part 3: Compute Mask for Each Square

for (Square s = SQ_A1; s <= SQ_H8; ++s)
{
    // Board edges are not considered in the relevant occupancies
    edges = ((Rank1BB | Rank8BB) & ~rank_bb(s)) | ((FileABB | FileHBB) & ~file_bb(s));

    masks[s] = sliding_attack(deltas, s, 0) & ~edges;
    shifts[s] = (Is64Bit ? 64 : 32) - popcount(masks[s]);

edges contains a bitboard where 1 means we can ignore that square for occupancy calculation. Its calculated for each square

Step 1: Calculate edge squares to exclude

// For rook on e4:
edges = ((Rank1BB | Rank8BB) & ~rank_bb(E4))  // Ranks 1 and 8, but NOT rank 4
      | ((FileABB | FileHBB) & ~file_bb(E4)); // Files a and h, but NOT file e

// Result: edges excludes e1, e8, a4, h4

Step 2: Compute sliding attacks on empty board, then remove edges

masks[s] = sliding_attack(deltas, s, 0)  // Attacks on empty board
         & ~edges;                        // Remove edge squares

// For rook on a1:
// sliding_attack gives: a1-a8, a1-h1 (full rank and file)
// Remove edges: a8, h1
// Result: {a2, a3, a4, a5, a6, a7, b1, c1, d1, e1, f1, g1}

Step 3: Calculate shift amount

shifts[s] = (Is64Bit ? 64 : 32) - popcount(masks[s]);

// For rook on a1: 12 relevant bits
// shifts[A1] = 64 - 12 = 52 (for 64-bit)

Part 4: Enumerate All Occupancy Patterns (Carry-Rippler)

b = size = 0;
do {
    occupancy[size] = b;
    reference[size] = sliding_attack(deltas, s, b);
    
    if (HasPext)
        attacks[s][pext(b, masks[s])] = reference[size];
    
    size++;
    b = (b - masks[s]) & masks[s];  // Carry-Rippler trick!
} while (b);

For this square s, generate every possible blocker configuration on the relevant squares.

For a rook square, you may have:

12 relevant squares
so 2¹² = 4096 possible occupancies

Stockfish must generate all of them at startup:

blocker pattern → correct attack bitboard

So we need:

for every subset b of mask:
    attacks[b] = sliding_attack(...)

Carry-Rippler Trick: Enumerating all subsets

The trick is:

b = (b - mask) & mask;

This generates all subsets of mask in a cycle.

What does “subset” mean?

If:

mask = 0b10110

The subset bitboards are:

Note: how 1st and 4th bit remain 0 in all subsets. Every subset contains only bits that exist in mask.

Why is this called Carry-Rippler?

Because subtraction causes a binary carry ripple through bits.

When you subtract mask, the borrow propagates through the bit pattern, flipping bits in exactly the right way to produce the next subset.

The Carry-Rippler Trick: Enumerates all subsets of masks[s]

How it works:

// Example: mask = 0b00010110 (bits at positions 1, 2, 4)
// We want to generate: 0b00000000, 0b00000010, 0b00000100, 0b00000110, 
//                      0b00010000, 0b00010010, 0b00010100, 0b00010110

b = 0;
Iteration 0: b = 0b00000000  // Subset: {}
Iteration 1: b = 0b00000010  // Subset: {1}
Iteration 2: b = 0b00000100  // Subset: {2}
Iteration 3: b = 0b00000110  // Subset: {1, 2}
Iteration 4: b = 0b00010000  // Subset: {4}
Iteration 5: b = 0b00010010  // Subset: {1, 4}
Iteration 6: b = 0b00010100  // Subset: {2, 4}
Iteration 7: b = 0b00010110  // Subset: {1, 2, 4}
Iteration 8: b = 0b00000000  // Wraps around, exit

Why b = (b - masks[s]) & masks[s] works:

Subtracting flips bits in a clever way
AND with mask keeps only relevant bits
Magically iterates through all 2^N subsets!

If PEXT available: Fill table immediately

if (HasPext)
    attacks[s][pext(b, masks[s])] = reference[size];
// PEXT gives us the index directly, so populate table now

Part 5: Find Magic Numbers (The Hard Part!)

if (HasPext)
    continue;  // Skip magic finding if we have PEXT

PRNG rng(seeds[Is64Bit][rank_of(s)]);

do {
    // Generate random magic candidate
    do
        magics[s] = rng.sparse_rand<Bitboard>();
    while (popcount((magics[s] * masks[s]) >> 56) < 6);
    
    // Test if this magic works
    for (++current, i = 0; i < size; ++i) {
        unsigned idx = index(s, occupancy[i]);
        
        if (age[idx] < current) {
            age[idx] = current;
            attacks[s][idx] = reference[i];
        }
        else if (attacks[s][idx] != reference[i])
            break;  // Collision! Try next magic
    }
} while (i < size);

Step 5a: Generate Magic Candidate

do
    magics[s] = rng.sparse_rand<Bitboard>();
while (popcount((magics[s] * masks[s]) >> 56) < 6);

sparse_rand(): Generates numbers with few bits set (sparse bitboards)

Magic numbers work better when sparse
Fewer bits = less chance of unwanted carries

Filter: popcount((magics[s] * masks[s]) >> 56) < 6

Quick rejection test
Multiplying magic by mask should produce at least 6 bits in top byte
If not enough bits at top, magic won’t spread occupancy well

Step 5b: Test the Magic

for (++current, i = 0; i < size; ++i) {
    unsigned idx = index(s, occupancy[i]);
    
    if (age[idx] < current) {
        age[idx] = current;
        attacks[s][idx] = reference[i];
    }
    else if (attacks[s][idx] != reference[i])
        break;  // Collision!
}

The age[] array trick: Detects collisions efficiently

int age[4096] = {0};  // Tracks which "attempt" last wrote to each index
int current = 0;       // Current attempt number

// For each new magic candidate:
++current;  // New attempt number

for each occupancy pattern i:
    idx = hash(occupancy[i]) using candidate magic
    
    if (age[idx] < current):
        // This index hasn't been used in this attempt yet
        age[idx] = current;
        attacks[s][idx] = reference[i];  // Store the correct attacks
    else:
        // This index was already used in this attempt!
        if (attacks[s][idx] != reference[i]):
            // COLLISION! Different occupancies map to same index
            // but have different attacks
            break;  // Reject this magic

Why this works:

If two occupancies hash to the same index but have the same attacks, it’s OK (constructive collision)
If they have different attacks, the magic is invalid (destructive collision)
age[] lets us detect this without clearing the array each attempt

A simple way to make sense of carry rippler is to think of subtraction as 2’s complement addition.

If we take 0b111 as mask. Cycle goes like this

b = 0, (0 - 0b111) = (0 + 0b001) = 0b0001 = 1
b = 1, (1 - 0b111) = (1 + 0b001) = 0b0010 =  2
b = 2, (0b010 - 0b111) = (0b010 + 0b001) = 0b0011 = 3
b = 2, (0b011 - 0b111) = (0b011 + 0b001) = 0b0100 = 4
...

We can see how it goes on generating all possible bit combinations.

Similarly for a partial mask like 0b101

b = 0, (0 - 0b101) = (0 + 0b011) = (0b011 & 0b101) = 0b001
b = 1, (1 - 0b001) = (0b001 + 0b011) = (0b100 & 0b101) = 0b100
b = 4, (4 - 0b001) = (0b100 + 0b011) = (0b111 & 0b101) = 0b101
b = 5, (5 - 0b001) = (0b101 + 0b011) = (0b000 & 0b101) = 0b000 <- wrap back, cycle ends

Calculating the Actual Attacks

/// Position::attackers_to() computes a bitboard of all pieces which attack a
/// given square. Slider attacks use the occupied bitboard to indicate occupancy.

Bitboard Position::attackers_to(Square s, Bitboard occupied) const {

  return  (attacks_from<PAWN>(s, BLACK)    & pieces(WHITE, PAWN))
        | (attacks_from<PAWN>(s, WHITE)    & pieces(BLACK, PAWN))
        | (attacks_from<KNIGHT>(s)         & pieces(KNIGHT))
        | (attacks_bb<ROOK  >(s, occupied) & pieces(ROOK,   QUEEN))
        | (attacks_bb<BISHOP>(s, occupied) & pieces(BISHOP, QUEEN))
        | (attacks_from<KING>(s)           & pieces(KING));
}

template<PieceType Pt>
inline Bitboard attacks_bb(Square s, Bitboard occupied) {

  extern Bitboard* RookAttacks[SQUARE_NB];
  extern Bitboard* BishopAttacks[SQUARE_NB];

  return (Pt == ROOK ? RookAttacks : BishopAttacks)[s][magic_index<Pt>(s, occupied)];
}

If we notice attacks_bb() computes rook attacks… but it never checks if there is actually a rook there? It only takes occupied. How does this make sense?

Because attacks_bb() does NOT mean “which rooks attack”.

It means: “If a rook were placed on square s, what squares would it attack, given the current blockers (occupied)?”

So how do we check if an actual rook can attack these squares?

Bitboard rookAttackers =
    attacks_bb<ROOK>(sq, occupied) & pieces(color, ROOK);

the & condition takes care of whether actual pieces are present on those attack rays.

Queen is considered for both rook and bishop, because if a rook can attack in an attack ray, queen can do the same. Same goes with bishop as well.

To conclude attacks_bb<ROOK>(sq, occupied) returns a bitboard where a rook on sq could attack considering all the blockers, when we do bitwise & with actual rooks and queens, we are checking if any rook/queen present in any of those squares, then this piece can be a attacker to the square sq.

Magic Bitboards and PEXT#

The Problem: Sliding Piece Attacks#

Why Sliding Pieces Are Hard#

The Challenge#

Naive Solutions (Too Slow)#

Solution 1: Compute On-Demand#

Solution 2: Pre-compute Everything#

The Insight: Most Bits Don’t Matter#

Relevant Occupancy#

The Next Idea: Perfect Hashing#

The Problem We Still Have#

All Possible Occupancy Patterns#

Why We Need All Combinations#

The Table Structure#

Naive Approach#

The Magic Bitboard Solution#

The Mask#

The Mapping Problem#

How multiplication helps#

The Magic Bitboard#

How are magic numbers calculated?#

Code Walkthrough#

1. The Magic Bitboard Data Structures#

1. RookMasks[SQUARE_NB]#

2. RookMagics[SQUARE_NB]#

3. RookAttacks[SQUARE_NB]#

4. RookShifts[SQUARE_NB]#

2. magic_index#

Step 1: Compile time branching selects the corresponding masks, magics and shifts.#

Step 2: Three Different Implementations#

Path 1: PEXT Instruction (Modern CPUs)#

Path 2: 64-bit Magic Bitboards (Standard Case)#

Path 3: 32-bit Magic Bitboards (Legacy)#

3. init_magics#

Part 1: Random Seeds#

Part 3: Compute Mask for Each Square#

Step 1: Calculate edge squares to exclude#

Step 2: Compute sliding attacks on empty board, then remove edges#

Step 3: Calculate shift amount#

Part 4: Enumerate All Occupancy Patterns (Carry-Rippler)#

Part 5: Find Magic Numbers (The Hard Part!)#

Calculating the Actual Attacks#

References