Enterprise Shield – The Evolution – Part 2: The System

Written by

in

Now that you have seen the why of Enterprise Shield, this post presents the how. By migrating from a simple set of SHELL scripts and flat files, things get more complex, but also far more manageable and scalable.

Also, by adding in a reporting capability, it is very easy to track who is trying to get in and report on volume, location, network, and type of attack.


1. The Firewall Architecture: Chains and Sets

1.1 How Traffic Flows

Every inbound packet passes through this evaluation sequence before anything else happens. Enterprise Shield inserts itself at position 1 of the INPUT chain, ahead of UFW’s rules.

Inbound Packet (any source)
         │
         ▼
┌─────────────────────────────┐
│        INPUT chain          │  ← UFW manages this
│  [position 1] SHIELD-LOGIC ─┼──────────────────────────────────────────┐
│  [position 2+] UFW rules    │                                          │
└─────────────────────────────┘                                          │
                                                                         ▼
                                              ┌──────────────────────────────────────────┐
                                              │           SHIELD-LOGIC chain             │
                                              │                                          │
                                              │  1. ESTABLISHED/RELATED → ACCEPT         │
                                              │     (existing connections pass through)  │
                                              │                                          │
                                              │  2. loopback (lo) → ACCEPT               │
                                              │                                          │
                                              │  3. LAN + own IP → ACCEPT                │
                                              │     (192.168.x.x, 127.x.x.x, public IP)  │
                                              │                                          │
                                              │  4. shield_allow → ACCEPT                │
                                              │     (manually whitelisted CIDRs/IPs)     │
                                              │                                          │
                                              │  5. shield_abuseipdb → DROP + LOG        │
                                              │     (AbuseIPDB flagged IPs, ≥90 score)   │
                                              │                                          │
                                              │  6. shield_block → DROP + LOG            │
                                              │     (ASN blocks + country blocks)        │
                                              │                                          │
                                              │  7. shield_penalty → DROP + LOG          │
                                              │     (time-limited penalty box)           │
                                              │                                          │
                                              │  8. shield_azure → AZURE-RATELIMIT       │
                                              │     (AS8075 / Microsoft Azure)           │
                                              │                                          │
                                              │  9. shield_hyperscaler → CLOUD-RATELIMIT │
                                              │     (AWS, GCP, Oracle, Cloudflare)       │
                                              │                                          │
                                              │ 10. RETURN → UFW handles remaining       │
                                              └──────────────────────────────────────────┘

Critical ordering note: shield_abuseipdb fires before the Azure and hyperscaler rate-limit rules (steps 8-9). This means a known-bad Azure IP is dropped outright rather than merely rate-limited. This was an explicit design
decision made during the final production verification.

1.2 The Rate-Limit Chains

Traffic that matches Azure or hyperscaler ipsets is not simply blocked — it would break Bingbot (which runs on AS8075) and legitimate cloud-based monitoring tools. Instead, it flows into dedicated rate-limit chains:

AZURE-RATELIMIT chain
├── hashlimit: 3 connections/minute per source IP, burst 2
├── If within limit: ACCEPT
└── If over limit: DROP + LOG [SHIELD_AZURE_LIMIT]

CLOUD-RATELIMIT chain
├── hashlimit: 20 connections/minute per source IP, burst 8
├── If within limit: ACCEPT
└── If over limit: DROP + LOG [SHIELD_CLOUD_LIMIT]

Azure gets a tighter limit (3/min) because it’s the most frequently abused hyperscaler ASN against this server. AWS/GCP/Oracle/Cloudflare get more headroom (20/min) to accommodate legitimate crawler and monitoring traffic.

Known limitation: HTTP/1.1 keep-alive connections bypass the hashlimit entirely because ESTABLISHED,RELATED packets are accepted at step 1 before they ever reach the rate-limit chains. This is a fundamental iptables constraint, not a bug in Enterprise Shield.

1.3 The Seven ipsets

┌─────────────────-─┬───────────────┬────────────────────────────────────────┐
│ ipset name        │ type          │ contents                               │
├──────────────────-┼───────────────┼────────────────────────────────────────┤
│ shield_allow      │ hash:net      │ Manually whitelisted CIDRs/IPs         │
│ shield_abuseipdb  │ hash:net      │ AbuseIPDB flagged IPs (score ≥ 90)     │
│ shield_block      │ hash:net      │ ASN blocks + 43-country blocks         │
│ shield_penalty    │ hash:net      │ Time-limited penalty box               │
│ shield_azure      │ hash:net      │ AS8075 (Microsoft Azure) CIDRs         │
│ shield_hyperscaler│  hash:net     │ AWS, GCP, Oracle, Cloudflare CIDRs     │
│ shield_country    │ hash:net      │ Country blocks (may merge with _block) │
└──────────────────-┴───────────────┴────────────────────────────────────────┘

All sets: maxelem 1000000, hashsize 131072
Total entries across all sets: ~500,000 CIDRs

1.4 The LOG Rules and Tag Format

Every DROP action in SHIELD-LOGIC includes a LOG rule that fires first. Each log message is tagged so rsyslog can route it:

TagMeaning
[SHIELD_BLOCK]Dropped by shield_block (ASN or country)
[SHIELD_ABUSEIPDB]Dropped by shield_abuseipdb
[SHIELD_PENALTY]Dropped by shield_penalty (penalty box)
[SHIELD_AZURE_LIMIT]Rate-limited by AZURE-RATELIMIT chain
[SHIELD_CLOUD_LIMIT]Rate-limited by CLOUD-RATELIMIT chain

The log line format from iptables looks like:

Jun 03 14:22:11 creaky2 kernel: [SHIELD_BLOCK] IN=eth0 OUT= SRC=185.220.101.47 DST=x.x.x.x PROTO=TCP DPT=443 ...

rsyslog matches on [SHIELD_ prefix and routes to /var/log/enterprise_shield/hits.log, which hits_parser.py then consumes every 5 minutes.


2. The Database Schema

The SQLite database is the authoritative state of the entire system. If you have the database, you can reconstruct everything.

2.1 Schema Overview

shield.db
│
├── cidr_blocks          ← every CIDR the system manages
├── asn_registry         ← all 447+ ASNs with their classification
├── country_registry     ← 43 countries with ETag cache
├── abuseipdb_entries    ← per-IP AbuseIPDB data, independent lifecycle
├── firewall_hits        ← raw parsed hits from rsyslog (rolling window)
├── hits_hourly          ← aggregated hourly rollup (long-term storage)
├── ip_enrichment        ← per-IP enrichment cache (ip-api.com, Shodan)
├── campaigns            ← computed attack campaign groupings (rebuilt each run)
└── system_state         ← all runtime state: timestamps, flags, hashes

2.2 cidr_blocks — The Core Table

CREATE TABLE cidr_blocks (
    id              INTEGER PRIMARY KEY,
    cidr            TEXT NOT NULL,          -- e.g. "185.220.0.0/16"
    source          TEXT NOT NULL,          -- 'asn', 'country', 'manual', 'allow'
    source_id       TEXT,                   -- ASN number or country code
    ipset_target    TEXT NOT NULL,          -- which ipset this CIDR belongs to
    first_seen      INTEGER,                -- epoch timestamp
    last_verified   INTEGER,               -- epoch of last successful WHOIS confirm
    UNIQUE(cidr, ipset_target)
);

The ipset_target field is what drives the actual ipset membership. When the rebuild runs, it diffs this table against the live ipset state and applies only the delta.

2.3 asn_registry — ASN Classification

CREATE TABLE asn_registry (
    asn             TEXT PRIMARY KEY,      -- 'AS3209', 'AS8075', etc.
    name            TEXT,                  -- human-readable name from WHOIS
    classification  TEXT NOT NULL,         -- 'block', 'azure', 'hyperscaler', 'allow'
    last_whois      INTEGER,               -- epoch of last WHOIS lookup
    fail_count      INTEGER DEFAULT 0,     -- consecutive WHOIS failures
    note            TEXT                   -- operator annotation
);

The classification field determines which ipset a CIDR ends up in. azureshield_azure, hyperscalershield_hyperscaler, blockshield_block.

2.4 abuseipdb_entries — Separate Lifecycle

CREATE TABLE abuseipdb_entries (
    ip              TEXT PRIMARY KEY,
    abuse_score     INTEGER,              -- 0-100 confidence score
    country_code    TEXT,
    last_seen       TEXT,                 -- from AbuseIPDB's "lastReportedAt"
    refreshed_at    INTEGER,             -- epoch of our last fetch
    in_ipset        INTEGER DEFAULT 0    -- currently loaded into shield_abuseipdb?
);

AbuseIPDB entries are refreshed 5× daily (at 00:00, 05:00, 10:00, 15:00, 20:00 UTC) because of the free tier’s rate limit on the blacklist endpoint. The confidence threshold is 90 — only IPs with a score of 90 or above are loaded into the ipset. The table currently holds ~10,000 entries.

2.5 firewall_hits and hits_hourly — The Hit Pipeline

CREATE TABLE firewall_hits (
    id          INTEGER PRIMARY KEY,
    hit_time    INTEGER NOT NULL,         -- epoch timestamp
    src_ip      TEXT NOT NULL,
    dst_port    INTEGER,
    protocol    TEXT,
    shield_tag  TEXT,                     -- SHIELD_BLOCK, SHIELD_ABUSEIPDB, etc.
    log_line    TEXT                      -- raw log line for debugging
);

CREATE TABLE hits_hourly (
    hour_bucket INTEGER NOT NULL,         -- epoch rounded to hour
    src_ip      TEXT NOT NULL,
    shield_tag  TEXT NOT NULL,
    hit_count   INTEGER DEFAULT 0,
    PRIMARY KEY (hour_bucket, src_ip, shield_tag)
);

Raw hits accumulate in firewall_hits. The hits_rollup.py job runs at 03:00 daily and aggregates rows older than RAW_RETENTION_DAYS (default: 3 days) into hits_hourly, then deletes the raw rows. hits_hourly retains data for AGGREGATE_RETENTION_DAYS (default: 365 days).

2.6 system_state — No More Flat Files

CREATE TABLE system_state (
    key     TEXT PRIMARY KEY,
    value   TEXT
);

-- Key entries:
-- 'last_rebuild_time'      : epoch of last successful rebuild
-- 'last_entry_count'       : CIDR count at last successful rebuild
-- 'rebuild_in_progress'    : '1' if a rebuild is currently running (crash detection)
-- 'last_public_ip'         : server's external IP at last rebuild
-- 'config_hash'            : SHA-256 of /etc/enterprise_shield/config.conf
-- 'last_abuseipdb_refresh' : epoch of last AbuseIPDB refresh
-- 'hits_parser_position'   : byte offset in hits.log (resume parsing from here)

The rebuild_in_progress flag is the crash recovery mechanism. If the system reboots
mid-rebuild, restore.py detects this flag on boot and triggers a full rebuild before
loading ipsets.


3. The Rebuild Flow

3.1 Nightly Rebuild (shield.py rebuild)

[cron: 30 2 * * *]
         │
         ▼
┌─────────────────────────────────────┐
│  Acquire PID lock                   │
│  Set rebuild_in_progress = 1 in DB  │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  Check config SHA-256 hash          │
│  If changed: re-import config file  │
│  Destroy any orphan staging ipsets  │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  WHOIS refresh (only stale ASNs)    │
│  Staleness threshold: 14 days       │
│  Typically 2-5 ASNs per night       │
│  On failure: keep existing CIDRs,   │
│  increment fail_count               │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  Country block refresh              │
│  ETag conditional GET to GitHub     │
│  304 Not Modified: skip download    │
│  200 OK: parse and update DB        │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  Compute delta                      │
│  DB state vs. live ipset state      │
│  Entries to ADD: new CIDRs in DB    │
│  Entries to DELETE: removed CIDRs   │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  DELTA SAFETY CHECK                 │
│  New count < (last × 0.95)?         │
│  YES → ABORT, preserve current set  │
│  NO  → proceed                      │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  Apply delta to staging ipsets      │
│  ipset swap staging → live (atomic) │
│  ipset destroy staging sets         │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  Update DB:                         │
│  - last_rebuild_time                │
│  - last_entry_count                 │
│  - rebuild_in_progress = 0          │
│  Release PID lock                   │
└─────────────────────────────────────┘

3.2 The Atomic Swap

The ipset swap is the critical section of the rebuild. The live set is never empty:

shield_block_staging (new CIDRs)
         │
         │  ipset swap shield_block_staging shield_block
         ▼
shield_block (now has new CIDRs — atomically)
         │
         │  ipset destroy shield_block_staging
         ▼
(staging set gone)

If the swap fails for any reason, the original shield_block is untouched and the staging set is left for cleanup on the next run.


4. The Boot Persistence Architecture

This is one of the most critical (and most debugged) parts of the system. Getting the ordering wrong leaves the server unprotected between reboot and first cron run.

4.1 The Two-Service Boot Sequence

Boot sequence:

kernel loads
    │
    ▼
systemd-modules-load.service     ← ensures ip_tables, ip_set kernel modules are loaded
    │
    ▼
local-fs.target                  ← ensures /var/lib/enterprise_shield/ is mounted
    │
    ▼
enterprise-shield-ipset-restore.service    ← runs restore.py --ipsets-only
    │  Loads all 7 ipsets from SQLite
    │  Sets ipsets before UFW needs them
    │
    ▼
ufw.service                      ← UFW loads its rules
    │  The ipsets now exist when UFW's before.rules references them
    │
    ▼
enterprise-shield-chain-restore.service   ← runs restore.py --chains-only
       Rebuilds SHIELD-LOGIC, AZURE-RATELIMIT, CLOUD-RATELIMIT chains
       Inserts "INPUT -j SHIELD-LOGIC" at position 1
       Uses --noflush so UFW's chains are preserved

4.2 Why the Ordering Matters

ServiceMust run BEFOREMust run AFTER
ipset-restoreufw.servicesystemd-modules-load, local-fs.target
chain-restore(nothing)ufw.service, ipset-restore

The DefaultDependencies=no setting is not used (a lesson learned from a failed boot cycle) — it was removed too aggressively early and prevented the kernel module loading dependency from being honoured.

4.3 What restore.py Does

# restore.py --ipsets-only
for each ipset in [shield_allow, shield_abuseipdb, shield_block,
                   shield_penalty, shield_azure, shield_hyperscaler]:
    create ipset if not exists
    bulk-load CIDRs from cidr_blocks WHERE ipset_target = ipset

# restore.py --chains-only
create SHIELD-LOGIC chain (flush if exists)
add ESTABLISHED/RELATED ACCEPT rule
add loopback ACCEPT rule
add LAN + own IP ACCEPT rule  (reads last_public_ip from system_state)
add shield_allow ACCEPT rule
add shield_abuseipdb DROP+LOG rule
add shield_block DROP+LOG rule
add shield_penalty DROP+LOG rule
add shield_azure → AZURE-RATELIMIT rule
add shield_hyperscaler → CLOUD-RATELIMIT rule
create AZURE-RATELIMIT chain with hashlimit rules
create CLOUD-RATELIMIT chain with hashlimit rules
iptables -I INPUT 1 -j SHIELD-LOGIC

4.4 Crash Recovery

If the server loses power mid-rebuild:

Next boot
    │
    ▼
enterprise-shield-ipset-restore.service
    │
    ▼
restore.py checks system_state WHERE key='rebuild_in_progress'
    │
    ├── value = '0': normal restore from DB
    │
    └── value = '1': CRASH DETECTED
             │
             ▼
        Log CRITICAL to /var/log/enterprise_shield/restore.log
        Load last known-good CIDRs from DB (last successfully committed state)
        Continue with normal restore
        Set rebuild_in_progress = 0

The last known-good state is whatever was in the database at the time of the crash. Because the DB commit happens after the ipset swap succeeds, any incomplete rebuild simply means the previous night’s CIDRs are loaded — which is correct behaviour.


5. The Hit Logging Pipeline

iptables LOG rule fires
    │
    │  Kernel writes to syslog
    ▼
/var/log/syslog
    │
    │  rsyslog matches: if $msg contains '[SHIELD_'
    ▼
/var/log/enterprise_shield/hits.log
    │
    │  hits_parser.py runs every 5 minutes via cron
    │  Resumes from byte offset stored in system_state.hits_parser_position
    ▼
firewall_hits table (SQLite)
    │
    │  hits_rollup.py runs at 03:00 daily
    │  Aggregates rows older than RAW_RETENTION_DAYS into hits_hourly
    │  Deletes aggregated raw rows
    ▼
hits_hourly table (SQLite)
    │
    │  deep_shield.py runs every 10 minutes
    │  Queries both tables, enriches IPs, computes campaigns
    ▼
private threat analysis dashboard

    │  public_shield.py runs hourly
    ▼
https://performancezen.com/shield/public_shield.html
(public-facing summary — sanitised, no internal data)

5.1 rsyslog Configuration

# /etc/rsyslog.d/10-enterprise-shield.conf
:msg, contains, "[SHIELD_" /var/log/enterprise_shield/hits.log
& stop

The & stop prevents these messages from also going into /var/log/syslog, keeping the main syslog uncluttered.

5.2 hits_parser.py — Resumable Parsing

The parser reads hits.log from the byte offset stored in system_state.hits_parser_position.
On each 5-minute run, it:

  1. Opens the file, seeks to the stored position
  2. Reads all new lines since the last run
  3. Parses each [SHIELD_*] log line for SRC=, DPT=, PROTO=
  4. Inserts rows into firewall_hits
  5. Updates hits_parser_position to the new end-of-file offset

If the log file is rotated (via logrotate), the parser detects the file is smaller than the stored offset and resets to position 0.


6. The AbuseIPDB Integration

AbuseIPDB operates on a completely separate lifecycle from the main rebuild:

[cron: 0 0,5,10,15,20 * * *]  (5× daily)
         │
         ▼
abuseipdb.py refresh
    │
    ├── Query AbuseIPDB /api/v2/blacklist
    │   Parameters: confidenceMinimum=90, limit=10000
    │
    ├── Parse response → list of {ip, abuseConfidenceScore, countryCode}
    │
    ├── Diff against abuseipdb_entries table:
    │   - New IPs: INSERT into table, add to shield_abuseipdb ipset
    │   - Removed IPs: DELETE from table, remove from shield_abuseipdb ipset
    │   - Unchanged IPs: update refreshed_at timestamp only
    │
    └── Log summary: X added, Y removed, Z unchanged

The AbuseIPDB ipset (shield_abuseipdb) is kept live-updated between the main nightly rebuilds. A known-bad IP that appears on AbuseIPDB is blocked within 5 hours maximum, without waiting for the 2:30AM rebuild.


7. The Penalty Box

The penalty box (shield_penalty ipset) handles time-limited blocks — typically IPs that have triggered specific application-layer rules or been manually added for investigation.

Adding to penalty box:
    shield.py penalty add <ip> [--hours N]  (default: 24 hours)
        │
        ├── INSERT into penalty_entries table with expires_at timestamp
        └── ipset add shield_penalty <ip>

Expiry check (every 15 minutes via cron):
    penalty.py expire
        │
        ├── SELECT from penalty_entries WHERE expires_at < NOW()
        ├── For each expired entry:
        │   ipset del shield_penalty <ip>
        │   DELETE from penalty_entries
        └── Log: "Expired N penalty entries"

The penalty box also survives reboots — restore.py loads shield_penalty from the database on boot, but only entries whose expires_at is still in the future. Expired entries are not restored.


8. Module Structure

All Python modules live at /usr/local/lib/enterprise_shield/:

/usr/local/lib/enterprise_shield/
├── shield.py          ← Main CLI: rebuild, add, add-asn, check, status
├── restore.py         ← Boot restore: --ipsets-only, --chains-only
├── abuseipdb.py       ← AbuseIPDB refresh daemon
├── hits_parser.py     ← rsyslog hits.log → firewall_hits table
├── hits_rollup.py     ← firewall_hits → hits_hourly aggregation
├── penalty.py         ← Penalty box expiry
├── deep_shield.py     ← Private threat analysis dashboard generator
├── public_shield.py   ← Public summary dashboard generator
├── db.py              ← Database connection and schema management
└── config.py          ← Configuration constants and file parsing

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *