Password policy auditing with NIST 800-63B compliance — obfus.link

1. Insight

Insight

The problem this article addresses and why it matters.

Composition rules are 2003 advice that hasn't aged well

For two decades the security advice for password policy was the same: minimum 8 characters, at least one uppercase, one lowercase, one digit, one special character. Most apps still enforce some variation. The advice was wrong by 2010 and definitively retired by NIST in 2017 when SP 800-63B flipped the recommendations: enforce minimum length (8 chars minimum, 64 chars supported), don't enforce composition rules, screen against known-breached password lists, don't force periodic rotation.

The new guidance reflects what password-cracking research has shown for years: composition rules push users toward predictable substitutions (P@ssw0rd instead of password), the search space is dominated by length not character variety, and forcing rotation produces weaker passwords (Password1! → Password2!) not stronger ones.

Most internal "review the auth policy" projects still end up reviewing the wrong things. Length is what matters; entropy is what you should measure; NIST 800-63B is what the policy should align with.

Why audit the policy, not just the password

The tool in this article has two modes. Password mode evaluates an individual password — useful for end-user password-strength meters and security audits of seeded test accounts. Policy mode evaluates the theoretical strength of a policy: given the minimum length, character set requirements, and breached-password screening, how strong are the worst allowed passwords?

A policy with minLength: 8, requireUpper: true, requireDigit: true allows a worst-case password like Aaaaaaa1 — eight characters, technically meets all rules, would be cracked in under a second. The policy auditor surfaces this by calculating the entropy of the worst allowed password against modern attack costs.

What this article delivers

Both modes walked end-to-end against a real auth-system audit. The password mode evaluates a few representative inputs (a weak one, a moderate one, a strong one) with crack-time estimates. The policy mode runs against three policy variants — overly-permissive, balanced, and NIST 800-63B-aligned — and produces a grade plus structured findings.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

Calculate entropy and crack-time estimates for any password
Evaluate a password policy against NIST 800-63B recommendations and get a letter grade
Identify weak policies that allow easily-guessable passwords despite "meeting" composition rules
Read the policy auditor's NIST findings and map each one back to the specific SP 800-63B requirement
Recognise the cases where neither mode is sufficient — high-value systems need a real password-cracking benchmark, not entropy math

The Examples section walks through evaluating three passwords + three policies and reading the NIST compliance output.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: evaluating three passwords

Three passwords across the strength spectrum:

Password	Composition	Length
`password`	lowercase	8
`MySecure9!`	mixed	10
`correct horse battery staple`	lowercase + spaces	28

passwordEntropy({ mode: 'password', password: 'password' });
// entropy: 18, crackTime: '2 milliseconds', strength: 'weak'
// issues: [
//   { issue: 'Common dictionary word — first guess in any cracker', severity: 'critical' },
//   { issue: 'Lowercase only — small character class', severity: 'warning' },
//   { issue: 'Short — below 12 characters', severity: 'warning' },
// ]

passwordEntropy({ mode: 'password', password: 'MySecure9!' });
// entropy: 49, crackTime: '3 hours', strength: 'fair'
// issues: [
//   { issue: 'Below NIST 12-character recommendation for memorised secrets', severity: 'info' },
// ]

passwordEntropy({ mode: 'password', password: 'correct horse battery staple' });
// entropy: 132, crackTime: '3 centuries', strength: 'extreme'
// issues: []

The XKCD-style passphrase scores 132 bits despite being lowercase-only — length dominates. MySecure9! looks "strong" by composition rules but takes 3 hours to crack with modern hashrates; the passphrase takes 3 centuries.

Before / after: auditing a policy

A team has this auth policy:

passwordEntropy({
  mode: 'policy',
  policy: {
    minLength:      8,
    requireUpper:   true,
    requireLower:   true,
    requireDigit:   true,
    requireSpecial: true,
    bannedWords:    [],
  },
  nistCompliance: true,
});

// policyEntropy:  40       (bits of entropy in the worst allowed password)
// policyGrade:    'C'
// nistFindings: [
//   {
//     requirement: 'NIST SP 800-63B §5.1.1.2: Memorized secrets SHALL be at least 8 chars',
//     status:      'pass',
//     recommendation: '',
//   },
//   {
//     requirement: 'NIST SP 800-63B §5.1.1.2: SHOULD support at least 64 chars',
//     status:      'partial',
//     recommendation: 'Verify your password-hashing field can store passwords up to 64 chars',
//   },
//   {
//     requirement: 'NIST SP 800-63B §5.1.1.2: SHOULD NOT impose composition rules',
//     status:      'fail',
//     recommendation: 'Drop the requireUpper/requireLower/requireDigit/requireSpecial rules. Composition rules push users toward predictable substitutions and reduce real-world entropy.',
//   },
//   {
//     requirement: 'NIST SP 800-63B §5.1.1.2: SHALL screen against breached passwords',
//     status:      'fail',
//     recommendation: 'Add a check against a known-breached password list (HaveIBeenPwned API or local hashed corpus).',
//   },
//   {
//     requirement: 'NIST SP 800-63B §5.1.1.2: SHALL NOT require periodic rotation',
//     status:      'pass',
//     recommendation: '',
//   },
// ]

Grade C with two fail findings. The recommendation list is the action plan: drop composition rules, add breached-password screening, recommend length over complexity.

Before / after: a NIST-aligned policy

passwordEntropy({
  mode: 'policy',
  policy: {
    minLength:      12,
    maxLength:      64,
    requireUpper:   false,
    requireLower:   false,
    requireDigit:   false,
    requireSpecial: false,
    bannedWords:    ['common-breached-corpus'],
  },
  nistCompliance: true,
});

// policyEntropy: 84
// policyGrade:   'A'
// nistFindings: [
//   {
//     requirement: 'NIST SP 800-63B §5.1.1.2: Memorized secrets SHALL be at least 8 chars',
//     status: 'pass',
//   },
//   { ... 'partial' status for screening (depending on actual breached list provided) ... },
//   {
//     requirement: 'NIST SP 800-63B §5.1.1.2: SHOULD NOT impose composition rules',
//     status: 'pass',
//   },
// ]

Grade A. The policy aligns with NIST guidance: length-focused, no composition rules, breached-password screening required. Worst-case allowed password (12 chars, lowercase letters) has 84 bits of entropy — about a decade to crack with modern hashrates.

Before / after: policy auditor as a security review tool

// In a security-review checklist for any auth project:
const result = passwordEntropy({
  mode:           'policy',
  policy:         actualPolicyFromCode,
  nistCompliance: true,
});

// If result.policyGrade < 'B' OR any nistFindings entry has status 'fail':
//   flag for security review

The grade becomes the SLA. Anything below B fails the security gate. The nistFindings array becomes the agenda for the review meeting.

When humans use this

A developer building a password-strength meter uses the password mode to drive the meter's classification — weak/fair/strong/extreme map directly to the meter's colour states. A security team auditing an internal auth system uses the policy mode to grade the policy against NIST and produces a remediation plan from the findings.

When agents use this

Two production patterns:

Auth-config review. An agent reviewing PRs to authentication code runs the policy mode against the configured policy. Any change that lowers the grade fails the PR. Any new fail finding fails the PR. Wires NIST 800-63B compliance into CI as a deterministic check.
Password-strength feedback. A signup-flow agent uses password mode to give real-time feedback on user passwords. The crack-time estimate (3 milliseconds vs 3 centuries) is more compelling to users than a generic "strong" badge — they can see what they're gaining by adding length.

Edge cases

Banned-word lists

The bannedWords policy parameter accepts a list. The auditor treats any non-empty list as "screening is configured" for the NIST check; the actual screening behaviour depends on the implementation. To verify your screening matches the HaveIBeenPwned breached corpus (the canonical reference), run the screening against the HIBP API at signup time — the tool doesn't enforce screening, it audits the policy's claim to do so.

Multilingual passwords

The crack-time estimate assumes ASCII-printable character classes. Unicode passwords (CJK characters, emoji) have much higher per-character entropy in theory, but real-world crackers target ASCII first — the entropy advantage holds against generic crackers, not against ones tuned for non-ASCII inputs.

Passphrase-vs-password distinction

correct horse battery staple scores high because of length. Real-world passphrase strength depends on word-list size and selection method — random words from a 7776-word Diceware list, picked truly uniformly, are the gold standard. A user-chosen passphrase from a smaller mental vocabulary is weaker than the entropy math suggests. The tool's entropy calculation is the upper bound; real-world strength may be lower.

Maximum length truncation

Some legacy password-hashing fields cap at 72 chars (bcrypt's input limit), 32 chars (some older systems). Passwords longer than the cap get silently truncated by the hasher. The tool surfaces a warning if the policy's maxLength exceeds 72 (bcrypt) or 64 (Argon2's default but configurable) — verify your hashing implementation supports the declared maximum.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field	Type	Required	Default	Description
`mode`	`'password' \| 'policy'`	✓	—	Workflow selector
`password`	`string`	for password mode	—	Single password to evaluate
`policy`	object	for policy mode	—	Policy parameters (see below)
`nistCompliance`	`boolean`	✗	`false`	Check against NIST SP 800-63B requirements

Policy object shape

policy: {
  minLength:      number;     // SHALL be >= 8 per NIST
  maxLength?:     number;     // SHOULD be >= 64 per NIST
  requireUpper:   boolean;
  requireLower:   boolean;
  requireDigit:   boolean;
  requireSpecial: boolean;
  allowedSpecial?: string;    // explicit set; defaults to all printable ASCII
  bannedWords?:    string[];  // signals breached-password screening is configured
}

Output shape

{
  // password mode
  entropy?:    number;       // bits
  crackTime:   string;       // human readable ('3 milliseconds', '3 centuries')
  strength:    'weak' | 'fair' | 'strong' | 'extreme';
  issues:      Array<{ issue, severity: 'critical' | 'warning' | 'info' }>;

  // policy mode
  policyEntropy?: number;    // bits of the worst allowed password
  policyGrade:    'A' | 'B' | 'C' | 'D' | 'F';
  nistFindings?: Array<{
    requirement:    string;   // verbatim NIST text
    status:         'pass' | 'partial' | 'fail';
    recommendation: string;
  }>;
}

Strength thresholds

Bits of entropy	Strength	Approximate crack time (modern hashrate)
0-30	weak	milliseconds to seconds
30-50	fair	seconds to hours
50-80	strong	days to years
80+	extreme	centuries+

NIST 800-63B requirements covered

§5.1.1.2: Memorized secrets SHALL be at least 8 chars; SHOULD support at least 64
§5.1.1.2: SHALL screen against breached passwords (HIBP-style corpus)
§5.1.1.2: SHOULD NOT impose composition rules (uppercase/digit/special)
§5.1.1.2: SHALL NOT require periodic rotation
§5.1.1.2: SHALL allow paste (no JS that blocks paste into password fields)
§5.1.1.2: SHALL allow all printable ASCII and SHOULD allow Unicode

bcrypt vs scrypt vs Argon2: which password hashing algorithm

The tool audits password policies — but a strong policy paired with the wrong hashing algorithm still loses to a modern GPU attack. The hash function choice matters as much as the policy.

Property	bcrypt	scrypt	Argon2id
Year	1999	2009	2015 (PHC winner)
Cost knob	iterations (`work factor`)	iterations + memory + parallelism	iterations + memory + parallelism
Memory-hard	No	Yes	Yes (most aggressive)
GPU / ASIC resistance	Low (small memory footprint)	Medium	High
Input length limit	72 bytes (silently truncates)	None	None
Recommended cost (2026)	`cost=12`	`N=2^17, r=8, p=1`	`m=64MB, t=3, p=4`
OWASP recommendation	Acceptable	Acceptable	Preferred

Pick Argon2id for new systems. It won the Password Hashing Competition in 2015, is the OWASP-recommended default, and ships in modern stdlib / well-maintained libraries for every common runtime (Node, Python, Go, Rust, Java, .NET). Memory-hardness combined with tunable parallelism makes GPU and ASIC attacks expensive in ways bcrypt isn't.

Pick scrypt when Argon2 isn't available in your runtime (older Erlang/BEAM, embedded targets, FIPS-mode environments that haven't certified Argon2 yet). Its memory-hardness blocks the cheap GPU attacks that bcrypt is vulnerable to.

Pick bcrypt only when you're locked into a stack where it's the only mature option. Watch the 72-byte input truncation — passwords longer than 72 bytes hash identically to their 72-byte prefix. That's an actual exploitable bug surface in the wild (see Okta's 2024 advisory caused by exactly this).

Don't use plain SHA-256, SHA-512, MD5, or single-round PBKDF2 for passwords. Plain cryptographic hashes are fast by design — exactly the wrong property for password storage. PBKDF2 is the bare-minimum legacy choice (FIPS environments); set the iteration count to 600,000+ if you must use it (OWASP 2026 guidance).

Cost tuning: the parameters above are starting points. Tune to your production hardware to land at 250-500ms per hash. Faster than 100ms is too cheap (attackers parallelize easily). Slower than 1s degrades login UX without buying meaningful additional security.

Error codes

Code	When it fires	Recovery
`INPUT_EMPTY`	Password or policy missing	Provide the required input for the chosen mode
`INPUT_INVALID_TYPE`	Policy `minLength` < 1 or > 128	Use a sensible minimum (8-12 recommended)
`INPUT_TOO_LARGE`	Password exceeds 256 chars (no real-world use)	Pre-truncate

When NOT to use this tool

For high-value systems (banking, healthcare, secrets-manager root passwords), entropy math is the lower bar. Real-world password security depends on the hashing algorithm (bcrypt/scrypt/argon2 cost parameters), the salt strategy, the rate-limiting on login attempts, the MFA layer above the password, and the breached-password screening. The tool audits the policy; the runtime hardening is a separate stack.

For credential-stuffing defence, the password-strength estimate is not the right primitive — strong passwords still get reused. Implement breached-password screening, MFA, and login rate-limiting; password strength alone doesn't help against credential stuffing.

Performance notes

Typical execution: under 5ms in either mode. The entropy calculation is a direct closed-form; the policy auditor runs the NIST checklist linearly. The tool is deterministic — same inputs always produce the same output — so REST responses are Edge-Cache eligible.

The crack-time estimates assume 10^12 hashes/second (a high-end GPU cluster, mid-2026 typical). Faster hashrates (ASICs, future hardware) shrink the estimate; slower hashes (argon2 with high cost) extend it.