Bytes, CSS units, LLM tokens: developer-specific unit conversion — obfus.link

1. Insight

Insight

The problem this article addresses and why it matters.

Developer units don't fit generic converters

A standard unit converter handles miles to kilometres, Fahrenheit to Celsius, ounces to grams. Developers don't need any of that. Developers need: bytes between IEC binary (KiB, MiB, 1024-based) and SI decimal (KB, MB, 1000-based), CSS units between px / rem / em / vh / vw with a configurable base font size and viewport, and — increasingly — LLM tokens between models with different tokenizer ratios.

The unit-conversion question developers actually have at 11pm on a deploy night is "how many tokens is this prompt for GPT-4 vs Claude vs Llama 3?" — and no generic converter answers it.

Why a developer-specific converter

The tool in this article ships three unit categories generic converters don't cover:

Data sizes with IEC vs SI base toggle. The infamous 1 KB = 1000 bytes vs 1 KiB = 1024 bytes ambiguity is a parameter, not an argument.
CSS units with configurable root font size and viewport dimensions. 1rem is 16px in default Tailwind, 14px in some design systems; the tool takes the system's base size as a parameter.
LLM tokens with model-specific tokenizer ratios. 1000 tokens is about 750 English words for GPT-4 but 680 for Claude — the ratios are different and matter for cost estimation.

What this article delivers

Conversion walks across the three categories with attention to the parameters that matter (IEC vs SI base, baseFontSize for CSS, tokenModel for tokens). We cover the LLM token math that's the most-used feature, the CSS viewport conversions for responsive design, and the IEC/SI confusion that costs every storage team time.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

Convert data sizes between bytes / KB / MB / GB / TB / PB in either IEC (1024-based) or SI (1000-based) form
Convert CSS units between px / rem / em / vh / vw / % / pt / ch with configurable base font size and viewport
Convert between tokens / characters / words / pages with model-specific tokenizer ratios for GPT-4, Claude, Llama 3, Gemini
Convert time units (ms / seconds / minutes / hours / days) with human-readable output
Pick the right base (IEC vs SI) for data sizes based on which side of "GB or GiB" your downstream expects

The Examples section walks through each category against representative developer scenarios.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: LLM tokens to words for cost estimation

A 1500-token GPT-4 prompt — how many words is that?

devUnitConverter({
  value:      1500,
  from:       'tokens',
  to:         'words',
  category:   'tokens',
  tokenModel: 'gpt-4',
});

// result:    1125
// formatted: '≈ 1,125 words'
// formula:   '1500 tokens × 0.75 words/token (GPT-4) = 1125 words'
// note:      'Token-to-word ratios are approximate estimates'

Same input, Claude:

devUnitConverter({
  value:      1500,
  from:       'tokens',
  to:         'words',
  category:   'tokens',
  tokenModel: 'claude-sonnet',
});

// result: 1020  (Claude packs more characters per token than GPT-4 — 0.68 words/token)

The per-model ratio matters for cost estimation. If your prompt is 1500 words, that's 2000 GPT-4 tokens but ~2200 Claude tokens — different cost at different rates per million tokens.

Before / after: data sizes with base toggle

A file system reports 32 GB. That's the SI definition (32 × 10^9 bytes). Your container's disk allocation is 32 GiB (32 × 2^30 bytes). They're different by 3.7%.

devUnitConverter({
  value: 32,
  from:  'GB',
  to:    'bytes',
  category: 'data',
  binaryBase: 'si',
});

// result:    32000000000
// formatted: '32,000,000,000 bytes'

devUnitConverter({
  value: 32,
  from:  'GiB',
  to:    'bytes',
  category: 'data',
  binaryBase: 'iec',
});

// result:    34359738368
// formatted: '34,359,738,368 bytes'

The 3.7% difference matters when you're sizing a 32 GiB volume against a 32 GB disk plan and finding out the volume doesn't fit.

Before / after: CSS rem to px with custom base

devUnitConverter({
  value:    2.5,
  from:     'rem',
  to:       'px',
  category: 'css',
  baseFontSize: 16,
});

// result:    40
// formatted: '40px'

// With a different design system's base:
devUnitConverter({
  value:    2.5,
  from:     'rem',
  to:       'px',
  category: 'css',
  baseFontSize: 14,
});

// result: 35

The baseFontSize parameter is what makes the CSS conversion useful — generic px/rem converters assume 16px, but real codebases vary.

Before / after: viewport-relative units

devUnitConverter({
  value:    100,
  from:     'vh',
  to:       'px',
  category: 'css',
  viewportHeight: 1080,
});

// result: 1080  (100% of viewport height at 1080px tall)

devUnitConverter({
  value:    50,
  from:     'vw',
  to:       'px',
  category: 'css',
  viewportWidth: 1440,
});

// result: 720  (50% of viewport width at 1440px wide)

Useful when verifying a design layout works at specific viewport sizes — render 50vw mentally as 720px on a 1440px viewport without manually multiplying.

When humans use this

Developers estimating LLM prompt costs use the token converter daily — "is this prompt close to the 4k token limit?" The CSS converter shows up in design-system discussions when teams switch base font sizes (10px-based design systems vs 16px-based vs Tailwind's 16px default). The data-size converter solves the recurring "wait, is this GB or GiB?" question that appears in every storage planning conversation.

When agents use this

Two patterns:

Cost-estimation agent. An agent costing out LLM workflows converts word counts to tokens per model, then multiplies by the per-token cost. Pre-call cost projection prevents surprise bills when the prompt is larger than expected.
Design-token converter. An agent generating CSS from design specs (Figma exports, design-system tokens) converts unit values to the target codebase's preferred unit. Rem-based codebases get rem, px-based codebases get px.

Edge cases

Tokenizer ratio drift

The per-model ratios are approximations. Actual token counts depend on the specific text — code tokenises differently than prose, non-Latin scripts tokenise differently than English. For accurate token counts on a specific input, use the model's actual tokenizer (tiktoken for OpenAI, Anthropic SDK's count_tokens for Claude). The converter is for estimates, not exact counts.

IEC vs SI in the wild

Marketing materials almost always use SI (GB as 10^9 bytes — bigger number, looks better). Technical contexts often use IEC (GiB) for clarity. The disk you buy is SI; the disk your OS reports is sometimes IEC, sometimes SI depending on the OS. The tool surfaces the base in the output so the consumer can verify.

CSS unit ambiguity

em is relative to the parent's font size, not the root's. rem is relative to the root only. The tool's CSS conversion treats both as relative to the configured base font — accurate for rem, an approximation for em (where the parent's font size is the actual reference). For pixel-exact em conversion, use a browser DevTools inspector.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field	Type	Required	Default	Description
`value`	`number`	✓	—	The amount to convert
`from`	`string`	✓	—	Source unit (e.g. `'tokens'`, `'GB'`, `'rem'`)
`to`	`string`	✓	—	Target unit
`category`	`'data' \| 'tokens' \| 'css' \| 'time' \| 'frequency'`	✓	—	Conversion category
`tokenModel`	`'gpt-4' \| 'gpt-4o' \| 'claude-sonnet' \| 'claude-opus' \| 'llama-3' \| 'gemini'`	for tokens category	—	Tokenizer ratio model
`baseFontSize`	`number`	for CSS px↔rem	`16`	Root font size in px
`viewportWidth`	`number`	for vw	—	Viewport width in px
`viewportHeight`	`number`	for vh	—	Viewport height in px
`binaryBase`	`'iec' \| 'si'`	for data	`'si'`	1024-based (IEC: KiB/MiB) vs 1000-based (SI: KB/MB)

Output shape

{
  result:    number;
  formatted: string;     // '1.44 GB' or '≈ 1,125 words'
  formula:   string;     // human-readable derivation
  note?:     string;     // disclaimers when the conversion is approximate
}

LLM tokenizer ratios (words per token)

Model	Ratio	Source
`gpt-4`	0.75	OpenAI's published estimate (1000 tokens ≈ 750 words English)
`gpt-4o`	0.77	Updated tokenizer (BPE with larger vocabulary)
`claude-sonnet`	0.68	Anthropic's character-per-token average for English
`claude-opus`	0.68	Same tokenizer family as Sonnet
`llama-3`	0.72	SentencePiece, Meta's published average
`gemini`	0.73	Google's documented estimate

Supported units by category

data: bytes, KB/KiB, MB/MiB, GB/GiB, TB/TiB, PB/PiB
tokens: tokens, characters, words, pages (1 page = 500 words)
css: px, rem, em, vh, vw, %, pt, ch
time: ms, seconds, minutes, hours, days
frequency: Hz, kHz, MHz, GHz, rpm

Error codes

Code	When it fires	Recovery
`INPUT_EMPTY`	`value` is 0 or missing	Provide a non-zero value
`INPUT_INVALID_TYPE`	`from` or `to` unit not in the category	Use a unit from the category's supported set
`UNSUPPORTED_FORMAT`	`category` value outside supported set	Use one of the five documented categories

When NOT to use this tool

For exact LLM token counts on a specific input, use the model's actual tokenizer (tiktoken for OpenAI, count_tokens in Anthropic SDK). This tool's per-model ratios are approximations calibrated to English prose; code and non-English text tokenise differently.

For real-time pixel-exact CSS layout debugging, use browser DevTools — the actual rendered pixel value depends on the live DOM context (font inheritance, viewport state, transforms). This tool gives the spec-level conversion.

Performance notes

Typical execution: under 1ms. Pure math, no I/O, fully deterministic. REST responses are Edge-Cache eligible.