After 100 “make a cat instead” edit requests…

gpt-image-2 edits - Image Input Pricing

gpt-image-2 edit inputs are not billed with the old gpt-image-1 / gpt-image-1.5 512 px tile algorithm.

For edit input images, gpt-image-2 appears to use the same general 32 px patch accounting family used by patch-based vision models, with a 1,536 patch budget, but with image-generation-specific modifications inserted before the ordinary patch-budget resize.

The important difference is:

gpt-image-2 magnifies smaller edit input images into a 1024 px conditioning canvas, caps the latent canvas aspect ratio to 1:3 through 3:1, and then applies the familiar patch-budget limit. This is a better interpretation of what’s happening than a model based on smaller patches.

Requested output size and quality do not change the input image token count. They belong to output-image generation pricing, not edit-input image pricing.

What changed from the published patch algorithm

Published patch-based vision pricing starts from this idea:

original_patch_count = ceil(width / 32) * ceil(height / 32)

Then, if the model patch budget or maximum dimension is exceeded, the image is resized down proportionally and counted again.

For gpt-image-2 edits, the same 32 px patch basis is used, but three unpublished steps are inserted.

Model sizing behavior

Model / endpoint	Patch size	Patch budget	Maximum dimension downscale	Small-image magnification	Aspect-ratio billing canvas
`gpt-image-2` edits	`32 x 32`	`1,536`	Not observed / not applied as a 2048 px cap	Up to `2x`, targeting a 1024 px longest side	Capped to `1:3` through `3:1`
Standard 1,536-patch vision models	`32 x 32`	`1,536`	Usually `2048` px	None	None
Higher-budget patch models	`32 x 32`	model/detail-specific	model/detail-specific	None	None

For gpt-image-2, the 1024 px value is not a maximum input dimension. It is the target longest side for magnifying small and medium inputs before patch counting. For 3:1 images, the useful upload cap is approximately 2112 x 704. Any larger dimension maximum that might come first is not observable.

Replacement algorithm with gpt-image-2 insertions

Let:

patch_size = 32
patch_budget = 1536

For ordinary patch models, begin at Step A.

For gpt-image-2 edits, insert Step 0 first.

Step 0. gpt-image-2 edit input magnification

This step applies only to gpt-image-2 edit input images.

if model is gpt-image-2 edits:
    long_side = max(width, height)
    scale = min(2, max(1, 1024 / long_side))

    effective_width = floor(width * scale)
    effective_height = floor(height * scale)
else:
    effective_width = width
    effective_height = height

Equivalently:

if long_side <= 512:
    effective image is 2x the original size
elif 512 < long_side < 1024:
    effective image is scaled so the longest side becomes 1024 px
else:
    effective image is the original size

The middle branch uses floor/truncation for the scaled integer dimensions.

Examples:

256 x 256  -> effective 512 x 512
768 x 768  -> effective 1024 x 1024
1023 x 512 -> effective 1024 x 512
1024 x 512 -> effective 1024 x 512
1025 x 512 -> effective 1025 x 512

This is why the observed behavior is non-monotonic near 1024 px: as an image approaches 1024 px, the magnification decreases.

Step A. Count 32 px patches

Use the effective dimensions from Step 0.

patch_width = ceil(effective_width / 32)
patch_height = ceil(effective_height / 32)

patch_count = patch_width * patch_height

Examples:

256 x 256:
    effective size = 512 x 512
    patch grid = 16 x 16
    image tokens = 256

768 x 768:
    effective size = 1024 x 1024
    patch grid = 32 x 32
    image tokens = 1024

1025 x 512:
    effective size = 1025 x 512
    patch grid = 33 x 16
    image tokens = 528

Step B. gpt-image-2 aspect-ratio canvas cap

This step applies only to gpt-image-2 edit input images.

After the initial patch grid is computed, the billing canvas is padded in patch space until the latent canvas aspect ratio is within 1:3 through 3:1.

if model is gpt-image-2 edits:
    if patch_width > 3 * patch_height:
        patch_height = ceil(patch_width / 3)

    if patch_height > 3 * patch_width:
        patch_width = ceil(patch_height / 3)

This is a billing/latent-canvas rule. It should not be assumed to mean that the API literally pads the uploaded image with transparent or colored pixels before vision encoding.

Examples:

1536 x 512:
    patch grid = 48 x 16
    aspect = 3:1
    no padding
    image tokens = 48 * 16 = 768

1537 x 512:
    patch grid = 49 x 16
    aspect exceeds 3:1
    padded patch grid = 49 x 17
    image tokens = 49 * 17 = 833

2048 x 512:
    patch grid = 64 x 16
    aspect exceeds 3:1
    padded patch grid = 64 x 22
    image tokens = 64 * 22 = 1408

Step C. Apply the patch budget

After the gpt-image-2 insertions above, apply the ordinary patch-budget behavior.

If the resulting patch grid is within budget, the image token count is:

image_tokens = patch_width * patch_height

If the resulting patch grid exceeds the patch budget, resize the effective billing canvas proportionally until it fits within the budget, then adjust the scale so the final integer patch coverage remains within the cap.

For gpt-image-2 edits:

patch_budget = 1536

Examples:

1536 x 1536:
    effective size = 1536 x 1536
    initial patch grid = 48 x 48 = 2304
    over budget
    budgeted patch grid = 39 x 39
    image tokens = 1521

2048 x 1024:
    effective size = 2048 x 1024
    initial patch grid = 64 x 32 = 2048
    over budget
    budgeted patch grid = 54 x 27
    image tokens = 1458

4096 x 2048:
    effective size = 4096 x 2048
    initial patch grid = 128 x 64 = 8192
    over budget
    budgeted patch grid = 54 x 27
    image tokens = 1458

Compact metacode version

This is the patch algorithm with gpt-image-2 insertions shown explicitly.

patch_size = 32

if model is gpt-image-2 edits:
    patch_budget = 1536

    long_side = max(width, height)
    scale = min(2, max(1, 1024 / long_side))

    width = floor(width * scale)
    height = floor(height * scale)

else:
    patch_budget = model_patch_budget
    apply ordinary model maximum-dimension rules

patch_width = ceil(width / patch_size)
patch_height = ceil(height / patch_size)

if model is gpt-image-2 edits:
    if patch_width > 3 * patch_height:
        patch_height = ceil(patch_width / 3)

    if patch_height > 3 * patch_width:
        patch_width = ceil(patch_height / 3)

if patch_width * patch_height > patch_budget:
    resize the billing canvas proportionally to fit patch_budget
    adjust the scale to stay under budget after integer patch coverage

image_tokens_before_multiplier = final_patch_width * final_patch_height

Coding it up - Python

Client-owned rule table

I’d keep model routing outside the calculator. In an edits API client, the table can be small, since only gpt-image-2 is considered “patches” at all:

IMAGE_INPUT_PATCH_RULES = {
    "gpt-image-2": {
        "maximum_patches": 1536,
        "maximum_size": None,
        "upscale_target": 1024,
        "maximum_upscale": 2,
        "maximum_aspect_ratio": 3,
    },
}

If you also use the same calculator for ordinary patch-based vision models elsewhere, those product clients can own their own rows, or direct to a classifier, for example:

STANDARD_1536_PATCH_RULES = {
    "maximum_patches": 1536,
    "maximum_size": 2048,
    "upscale_target": None,
    "maximum_upscale": 1,
    "maximum_aspect_ratio": None,
}

We have a function that anticipates the changes as parameters possibly reused or re-tuned again.

The calculator itself does not need to know model names. You can map gpt-5.2+ models, the re-defined “high” budget, or highest “original” to the “class” they belong to.

Self-contained token-count function

This function returns the image-token count before any model-specific pricing multiplier. For gpt-image-2 edits, this is the observed input image token count, no multiplier. The higher pricing of input image tokens itself vs text input gives an effective 1.6x multiplier.

def patch_image_token_count(
    width: int,
    height: int,
    *,
    maximum_patches: int = 1536,
    maximum_size: int | None = None,
    upscale_target: int | None = 1024,
    maximum_upscale: int | float = 2,
    maximum_aspect_ratio: int | None = 3,
) -> int:
    """Return the patch-count image tokens before pricing multipliers.

    The defaults describe the observed gpt-image-2 edits image-input rules:
    32 px patches, 1536 maximum patches, up to 2x small-image upscaling
    toward a 1024 px longest side, and a 3:1 patch-canvas aspect limit.

    Set upscale_target=None, maximum_upscale=1, and maximum_aspect_ratio=None
    for ordinary patch-based vision behavior.
    """
    from math import ceil, floor, sqrt

    rules = {
        "patch_size": 32,
        "maximum_patches": maximum_patches,
        "maximum_size": maximum_size,
        "upscale_target": upscale_target,
        "maximum_upscale": maximum_upscale,
        "maximum_aspect_ratio": maximum_aspect_ratio,
    }

    def ceil_div(numerator: int, denominator: int) -> int:
        return -(-numerator // denominator)

    def ceil_unit(value: float) -> int:
        return max(1, ceil(value - 1e-12))

    def effective_size(source_width: int, source_height: int) -> tuple[int, int]:
        target = rules["upscale_target"]
        upscale = float(rules["maximum_upscale"])

        if target is None or upscale <= 1:
            return source_width, source_height

        long_side = max(source_width, source_height)
        scale = min(upscale, max(1.0, target / long_side))

        return max(1, floor(source_width * scale)), max(
            1,
            floor(source_height * scale),
        )

    def patch_grid(pixel_width: int, pixel_height: int) -> tuple[int, int]:
        patch_size = rules["patch_size"]

        return (
            ceil_div(pixel_width, patch_size),
            ceil_div(pixel_height, patch_size),
        )

    def billing_canvas(
        pixel_width: int,
        pixel_height: int,
    ) -> tuple[int, int, int, int]:
        patch_size = rules["patch_size"]
        aspect = rules["maximum_aspect_ratio"]

        patches_wide, patches_high = patch_grid(pixel_width, pixel_height)
        canvas_width = pixel_width
        canvas_height = pixel_height

        if aspect is not None:
            if patches_wide > aspect * patches_high:
                patches_high = ceil_div(patches_wide, aspect)
                canvas_width = patches_wide * patch_size
                canvas_height = patches_high * patch_size
            elif patches_high > aspect * patches_wide:
                patches_wide = ceil_div(patches_high, aspect)
                canvas_width = patches_wide * patch_size
                canvas_height = patches_high * patch_size

        return patches_wide, patches_high, canvas_width, canvas_height

    def fit_scale(
        canvas_width: int,
        canvas_height: int,
        patches_wide: int,
        patches_high: int,
    ) -> float:
        patch_size = rules["patch_size"]
        patch_limit = rules["maximum_patches"]
        size_limit = rules["maximum_size"]

        scale = 1.0

        if size_limit is not None:
            scale = min(scale, size_limit / max(canvas_width, canvas_height))

        if patches_wide * patches_high > patch_limit:
            area_scale = sqrt(
                (patch_size * patch_size * patch_limit)
                / (canvas_width * canvas_height)
            )
            scale = min(scale, area_scale, 1.0)

        scaled_wide = canvas_width * scale / patch_size
        scaled_high = canvas_height * scale / patch_size

        if ceil_unit(scaled_wide) * ceil_unit(scaled_high) <= patch_limit:
            return scale

        return scale * min(
            floor(scaled_wide) / scaled_wide,
            floor(scaled_high) / scaled_high,
        )

    effective_width, effective_height = effective_size(width, height)

    patches_wide, patches_high, canvas_width, canvas_height = billing_canvas(
        effective_width,
        effective_height,
    )

    patch_count = patches_wide * patches_high
    size_exceeded = (
        rules["maximum_size"] is not None
        and max(canvas_width, canvas_height) > rules["maximum_size"]
    )

    if patch_count <= rules["maximum_patches"] and not size_exceeded:
        return patch_count

    scale = fit_scale(
        canvas_width,
        canvas_height,
        patches_wide,
        patches_high,
    )

    return (
        ceil_unit(canvas_width * scale / rules["patch_size"])
        * ceil_unit(canvas_height * scale / rules["patch_size"])
    )

Usage

rules = IMAGE_INPUT_PATCH_RULES["gpt-image-2"]  # model table lookup

tokens = patch_image_token_count(2048, 1024, **rules)

Or, with the gpt-image-2 edits defaults, we validate vs some actual usage discovered:

assert patch_image_token_count(1, 1) == 1
assert patch_image_token_count(16, 16) == 1
assert patch_image_token_count(17, 17) == 4

assert patch_image_token_count(256, 256) == 256
assert patch_image_token_count(384, 384) == 576
assert patch_image_token_count(512, 512) == 1024
assert patch_image_token_count(768, 768) == 1024
assert patch_image_token_count(1024, 1024) == 1024
assert patch_image_token_count(1536, 1536) == 1521
assert patch_image_token_count(2048, 2048) == 1521

assert patch_image_token_count(1023, 512) == 512
assert patch_image_token_count(1024, 512) == 512
assert patch_image_token_count(1025, 512) == 528
assert patch_image_token_count(1536, 512) == 768
assert patch_image_token_count(1537, 512) == 833
assert patch_image_token_count(2048, 512) == 1408
assert patch_image_token_count(2049, 512) == 1430

assert patch_image_token_count(2048, 1024) == 1458
assert patch_image_token_count(4096, 2048) == 1458

Full observed test list

For code port verification, for example

test_cases: list[tuple[tuple[int, int], int, str]] = [
    ((640, 512), 832, "512 < max < 1024 resize branch"),
    ((800, 600), 768, "mid-size non-square resize-to-512 branch"),
    ((1023, 768), 768, "last pixel before 1024 long-side branch"),
    ((1023, 1023), 1024, "square just before 1024 long-side branch"),
    ((1025, 768), 792, "first pixel after 1024 long-side branch"),
    ((1025, 1025), 1089, "square just after 1024 long-side branch"),
    ((1057, 1057), 1156, "half-scale rounding crosses 16px grid"),
    ((1216, 1216), 1444, "square just below budget knee"),
    ((1217, 1217), 1521, "square just above budget knee"),
    ((1280, 1280), 1521, "square budget shrink from 40x40 grid"),
    ((2048, 2048), 1521, "large square plateau check"),
    ((1569, 512), 850, "wide latent-aspect cap after rounding edge"),
    ((1633, 512), 936, "wide latent-aspect cap short-side ceil step"),
    ((1953, 512), 1302, "wide latent-aspect cap near 2048 region"),
    ((2304, 512), 1452, "aspect cap followed by budget shrink"),
    ((1664, 832), 1352, "2:1 below budget knee"),
    ((1696, 848), 1431, "2:1 near budget knee, still under"),
    ((1760, 880), 1458, "2:1 just over grid budget"),
    ((3072, 768), 1452, "large 4:1 aspect/budget order check"),
    ((3072, 1024), 1452, "large 3:1 scale-depth check"),
    ((1, 1), 1, "absolute tiny square; detects any hidden minimum upscale"),
    ((8, 8), 1, "below one 16px patch in both dimensions"),
    ((15, 15), 1, "last square before exact 16px patch boundary"),
    ((16, 16), 1, "exactly one 16px patch"),
    ((17, 17), 4, "first square requiring 2x2 latent patches"),
    ((31, 31), 4, "inside 2x2 latent patch bucket"),
    ((32, 32), 4, "exact 2x2 latent patch boundary"),
    ((33, 33), 9, "first square requiring 3x3 latent patches"),
    ((64, 64), 16, "small power-of-two square, 4x4 latent patches"),
    ((128, 128), 64, "larger small square, 8x8 latent patches"),
    ((255, 255), 256, "just below already-tested 256 square"),
    ((257, 257), 289, "just above already-tested 256 square"),
    ((16, 1), 1, "one-pixel short side, still one latent patch tall"),
    ((17, 1), 2, "one-pixel short side with 2x1 latent patches"),
    ((48, 1), 3, "exact 3:1 latent aspect ratio, no padding expected"),
    ((49, 1), 8, "first tiny case requiring 3:1 aspect padding"),
    ((97, 1), 21, "later tiny aspect-padding ceil step"),
    ((512, 1), 352, "largest no-resize one-pixel-high image"),
    ((513, 1), 352, "resize-to-512 branch with one-pixel short side"),
    ((1025, 1), 363, "half-scale branch and half-up rounding with tiny side"),
]

for (width, height), expected, description in test_cases:
    actual = patch_image_token_count(width, height)
    assert actual == expected, (
        f"{width}x{height}: expected {expected}, got {actual}; {description}"
    )

Self-contained recommended upload-size function

This second function is deliberately standalone. It does not call the token-count function above, so it can be copy-pasted by itself.

It returns dimensions that avoid predictable server-side downscaling under the same patch accounting rules. It never recommends upscaling the source, as that won’t save transmission bytes.

Aspect-ratio padding for the 3:1 / 1:3 canvas rule is treated as a billing-canvas behavior only. We don’t know if “padding” is true or seen. This function preserves the original image aspect ratio and does not add padding.

def recommended_patch_upload_size(
    width: int,
    height: int,
    *,
    maximum_patches: int = 1536,
    maximum_size: int | None = None,
    upscale_target: int | None = 1024,
    maximum_upscale: int | float = 2,
    maximum_aspect_ratio: int | None = 3,
) -> tuple[int, int]:
    """Return upload dimensions that avoid predictable server downscaling.

    The defaults describe the observed gpt-image-2 edits image-input rules.
    The returned size preserves the original image aspect ratio and never
    upscales the source image.
    """
    from math import ceil, floor, sqrt

    rules = {
        "patch_size": 32,
        "maximum_patches": maximum_patches,
        "maximum_size": maximum_size,
        "upscale_target": upscale_target,
        "maximum_upscale": maximum_upscale,
        "maximum_aspect_ratio": maximum_aspect_ratio,
    }

    def ceil_div(numerator: int, denominator: int) -> int:
        return -(-numerator // denominator)

    def ceil_unit(value: float) -> int:
        return max(1, ceil(value - 1e-12))

    def round_half_up(value: float) -> int:
        return max(1, floor(value + 0.5))

    def effective_size(source_width: int, source_height: int) -> tuple[int, int]:
        target = rules["upscale_target"]
        upscale = float(rules["maximum_upscale"])

        if target is None or upscale <= 1:
            return source_width, source_height

        long_side = max(source_width, source_height)
        scale = min(upscale, max(1.0, target / long_side))

        return max(1, floor(source_width * scale)), max(
            1,
            floor(source_height * scale),
        )

    def patch_grid(pixel_width: int, pixel_height: int) -> tuple[int, int]:
        patch_size = rules["patch_size"]

        return (
            ceil_div(pixel_width, patch_size),
            ceil_div(pixel_height, patch_size),
        )

    def billing_canvas(
        pixel_width: int,
        pixel_height: int,
    ) -> tuple[int, int, int, int]:
        patch_size = rules["patch_size"]
        aspect = rules["maximum_aspect_ratio"]

        patches_wide, patches_high = patch_grid(pixel_width, pixel_height)
        canvas_width = pixel_width
        canvas_height = pixel_height

        if aspect is not None:
            if patches_wide > aspect * patches_high:
                patches_high = ceil_div(patches_wide, aspect)
                canvas_width = patches_wide * patch_size
                canvas_height = patches_high * patch_size
            elif patches_high > aspect * patches_wide:
                patches_wide = ceil_div(patches_high, aspect)
                canvas_width = patches_wide * patch_size
                canvas_height = patches_high * patch_size

        return patches_wide, patches_high, canvas_width, canvas_height

    def fit_scale(
        canvas_width: int,
        canvas_height: int,
        patches_wide: int,
        patches_high: int,
    ) -> float:
        patch_size = rules["patch_size"]
        patch_limit = rules["maximum_patches"]
        size_limit = rules["maximum_size"]

        scale = 1.0

        if size_limit is not None:
            scale = min(scale, size_limit / max(canvas_width, canvas_height))

        if patches_wide * patches_high > patch_limit:
            area_scale = sqrt(
                (patch_size * patch_size * patch_limit)
                / (canvas_width * canvas_height)
            )
            scale = min(scale, area_scale, 1.0)

        scaled_wide = canvas_width * scale / patch_size
        scaled_high = canvas_height * scale / patch_size

        if ceil_unit(scaled_wide) * ceil_unit(scaled_high) <= patch_limit:
            return scale

        return scale * min(
            floor(scaled_wide) / scaled_wide,
            floor(scaled_high) / scaled_high,
        )

    effective_width, effective_height = effective_size(width, height)

    patches_wide, patches_high, canvas_width, canvas_height = billing_canvas(
        effective_width,
        effective_height,
    )

    patch_count = patches_wide * patches_high
    size_exceeded = (
        rules["maximum_size"] is not None
        and max(canvas_width, canvas_height) > rules["maximum_size"]
    )

    if patch_count <= rules["maximum_patches"] and not size_exceeded:
        return width, height

    scale = fit_scale(
        canvas_width,
        canvas_height,
        patches_wide,
        patches_high,
    )

    if scale >= 1:
        return width, height

    return round_half_up(width * scale), round_half_up(height * scale)

Upload-size examples

assert recommended_patch_upload_size(1536, 1536) == (1248, 1248)
assert recommended_patch_upload_size(2048, 2048) == (1248, 1248)

assert recommended_patch_upload_size(2048, 1024) == (1728, 864)
assert recommended_patch_upload_size(4096, 2048) == (1728, 864)

assert recommended_patch_upload_size(2304, 512) == (2112, 469)

How to use this in edits API client code

The model routing can stay in the API product layer:

IMAGE_INPUT_PATCH_RULES = {
    "gpt-image-2": {
        "maximum_patches": 1536,
        "maximum_size": None,
        "upscale_target": 1024,
        "maximum_upscale": 2,
        "maximum_aspect_ratio": 3,
    },
}

model = "gpt-image-2"
rules = IMAGE_INPUT_PATCH_RULES[model]

image_tokens = patch_image_token_count(2048, 1024, **rules)
upload_width, upload_height = recommended_patch_upload_size(2048, 1024, **rules)

assert image_tokens == 1458
assert (upload_width, upload_height) == (1728, 864)

This keeps the pricing calculator generic and keeps product/model truth where it belongs: in the client’s model capability table, where you’d have token costs and more parameter gates. If you want to not anticipate the future, go ahead and unroll into a single gpt-image-2 function.

Minimal parameter truth table

These are the parameter classes needed by a generic reference function for coverage of patches caps and behaviors on all vision models seen, where gpt-image-2 should be included directly, not as a little addendum like gpt-image-1 (and with gpt-image-1.5 or mini also not mentioned in documentation.)

Class	`patch_budget`	`max_dimension`	`upscale_target`	`max_upscale`	`aspect_ratio_limit`
`gpt-image-2` edits	`1536`	`None`	`1024`	`2`	`3`
1,536-patch standard vision models	`1536`	`2048`	`None`	`1`	`None`
2,500-patch high-detail models	`2500`	`2048`	`None`	`1`	`None`
10,000-patch original-detail models	`10000`	`6000`	`None`	`1`	`None`

The gpt-image-2 row is the only one with the new behavior. In metacode terms, the new row is equivalent to adding code that accepts tweaks:

if model is gpt-image-2 edits:
    upscale_target = 1024
    max_upscale = 2
    aspect_ratio_limit = 3
    max_dimension = None
    patch_budget = 1536

Everything else can continue to use the ordinary patch-budget calculator. Your edits API cost estimator is where this becomes valuable.

Now I’ve got apps to tune up for this “vision” pricing.
Happy calculating!

AI/ML News & Innovations Hub

OpenAI must document the input image pricing of gpt-image-2 (so I did)

gpt-image-2 edits - Image Input Pricing

What changed from the published patch algorithm

Model sizing behavior

Replacement algorithm with gpt-image-2 insertions

Step 0. gpt-image-2 edit input magnification

Step A. Count 32 px patches

Step B. gpt-image-2 aspect-ratio canvas cap

Step C. Apply the patch budget

Compact metacode version

Coding it up - Python

Client-owned rule table

Self-contained token-count function

Usage

For code port verification, for example

Self-contained recommended upload-size function

Upload-size examples

How to use this in edits API client code

Minimal parameter truth table

OpenAI *must* document the input image pricing of gpt-image-2 (so I did)

gpt-image-2 edits - Image Input Pricing

What changed from the published patch algorithm

Model sizing behavior

Replacement algorithm with gpt-image-2 insertions

Step 0. gpt-image-2 edit input magnification

Step A. Count 32 px patches

Step B. gpt-image-2 aspect-ratio canvas cap

Step C. Apply the patch budget

Compact metacode version

Coding it up - Python

Client-owned rule table

Self-contained token-count function

Usage

For code port verification, for example

Self-contained recommended upload-size function

Upload-size examples

How to use this in edits API client code

Minimal parameter truth table

OpenAI must document the input image pricing of gpt-image-2 (so I did)