After 100 “make a cat instead” edit requests…
gpt-image-2 edits - Image Input Pricing
gpt-image-2 edit inputs are not billed with the old gpt-image-1 / gpt-image-1.5 512 px tile algorithm.
For edit input images, gpt-image-2 appears to use the same general 32 px patch accounting family used by patch-based vision models, with a 1,536 patch budget, but with image-generation-specific modifications inserted before the ordinary patch-budget resize.
The important difference is:
gpt-image-2magnifies smaller edit input images into a 1024 px conditioning canvas, caps the latent canvas aspect ratio to1:3through3:1, and then applies the familiar patch-budget limit. This is a better interpretation of what’s happening than a model based on smaller patches.
Requested output size and quality do not change the input image token count. They belong to output-image generation pricing, not edit-input image pricing.
What changed from the published patch algorithm
Published patch-based vision pricing starts from this idea:
original_patch_count = ceil(width / 32) * ceil(height / 32)
Then, if the model patch budget or maximum dimension is exceeded, the image is resized down proportionally and counted again.
For gpt-image-2 edits, the same 32 px patch basis is used, but three unpublished steps are inserted.
Model sizing behavior
| Model / endpoint | Patch size | Patch budget | Maximum dimension downscale | Small-image magnification | Aspect-ratio billing canvas |
|---|---|---|---|---|---|
gpt-image-2 edits |
32 x 32 |
1,536 |
Not observed / not applied as a 2048 px cap | Up to 2x, targeting a 1024 px longest side |
Capped to 1:3 through 3:1 |
| Standard 1,536-patch vision models | 32 x 32 |
1,536 |
Usually 2048 px |
None | None |
| Higher-budget patch models | 32 x 32 |
model/detail-specific | model/detail-specific | None | None |
For gpt-image-2, the 1024 px value is not a maximum input dimension. It is the target longest side for magnifying small and medium inputs before patch counting. For 3:1 images, the useful upload cap is approximately 2112 x 704. Any larger dimension maximum that might come first is not observable.
Replacement algorithm with gpt-image-2 insertions
Let:
patch_size = 32
patch_budget = 1536
For ordinary patch models, begin at Step A.
For gpt-image-2 edits, insert Step 0 first.
Step 0. gpt-image-2 edit input magnification
This step applies only to gpt-image-2 edit input images.
if model is gpt-image-2 edits:
long_side = max(width, height)
scale = min(2, max(1, 1024 / long_side))
effective_width = floor(width * scale)
effective_height = floor(height * scale)
else:
effective_width = width
effective_height = height
Equivalently:
if long_side <= 512:
effective image is 2x the original size
elif 512 < long_side < 1024:
effective image is scaled so the longest side becomes 1024 px
else:
effective image is the original size
The middle branch uses floor/truncation for the scaled integer dimensions.
Examples:
256 x 256 -> effective 512 x 512
768 x 768 -> effective 1024 x 1024
1023 x 512 -> effective 1024 x 512
1024 x 512 -> effective 1024 x 512
1025 x 512 -> effective 1025 x 512
This is why the observed behavior is non-monotonic near 1024 px: as an image approaches 1024 px, the magnification decreases.
Step A. Count 32 px patches
Use the effective dimensions from Step 0.
patch_width = ceil(effective_width / 32)
patch_height = ceil(effective_height / 32)
patch_count = patch_width * patch_height
Examples:
256 x 256:
effective size = 512 x 512
patch grid = 16 x 16
image tokens = 256
768 x 768:
effective size = 1024 x 1024
patch grid = 32 x 32
image tokens = 1024
1025 x 512:
effective size = 1025 x 512
patch grid = 33 x 16
image tokens = 528
Step B. gpt-image-2 aspect-ratio canvas cap
This step applies only to gpt-image-2 edit input images.
After the initial patch grid is computed, the billing canvas is padded in patch space until the latent canvas aspect ratio is within 1:3 through 3:1.
if model is gpt-image-2 edits:
if patch_width > 3 * patch_height:
patch_height = ceil(patch_width / 3)
if patch_height > 3 * patch_width:
patch_width = ceil(patch_height / 3)
This is a billing/latent-canvas rule. It should not be assumed to mean that the API literally pads the uploaded image with transparent or colored pixels before vision encoding.
Examples:
1536 x 512:
patch grid = 48 x 16
aspect = 3:1
no padding
image tokens = 48 * 16 = 768
1537 x 512:
patch grid = 49 x 16
aspect exceeds 3:1
padded patch grid = 49 x 17
image tokens = 49 * 17 = 833
2048 x 512:
patch grid = 64 x 16
aspect exceeds 3:1
padded patch grid = 64 x 22
image tokens = 64 * 22 = 1408
Step C. Apply the patch budget
After the gpt-image-2 insertions above, apply the ordinary patch-budget behavior.
If the resulting patch grid is within budget, the image token count is:
image_tokens = patch_width * patch_height
If the resulting patch grid exceeds the patch budget, resize the effective billing canvas proportionally until it fits within the budget, then adjust the scale so the final integer patch coverage remains within the cap.
For gpt-image-2 edits:
patch_budget = 1536
Examples:
1536 x 1536:
effective size = 1536 x 1536
initial patch grid = 48 x 48 = 2304
over budget
budgeted patch grid = 39 x 39
image tokens = 1521
2048 x 1024:
effective size = 2048 x 1024
initial patch grid = 64 x 32 = 2048
over budget
budgeted patch grid = 54 x 27
image tokens = 1458
4096 x 2048:
effective size = 4096 x 2048
initial patch grid = 128 x 64 = 8192
over budget
budgeted patch grid = 54 x 27
image tokens = 1458
Compact metacode version
This is the patch algorithm with gpt-image-2 insertions shown explicitly.
patch_size = 32
if model is gpt-image-2 edits:
patch_budget = 1536
long_side = max(width, height)
scale = min(2, max(1, 1024 / long_side))
width = floor(width * scale)
height = floor(height * scale)
else:
patch_budget = model_patch_budget
apply ordinary model maximum-dimension rules
patch_width = ceil(width / patch_size)
patch_height = ceil(height / patch_size)
if model is gpt-image-2 edits:
if patch_width > 3 * patch_height:
patch_height = ceil(patch_width / 3)
if patch_height > 3 * patch_width:
patch_width = ceil(patch_height / 3)
if patch_width * patch_height > patch_budget:
resize the billing canvas proportionally to fit patch_budget
adjust the scale to stay under budget after integer patch coverage
image_tokens_before_multiplier = final_patch_width * final_patch_height
Coding it up - Python
Client-owned rule table
I’d keep model routing outside the calculator. In an edits API client, the table can be small, since only gpt-image-2 is considered “patches” at all:
IMAGE_INPUT_PATCH_RULES = {
"gpt-image-2": {
"maximum_patches": 1536,
"maximum_size": None,
"upscale_target": 1024,
"maximum_upscale": 2,
"maximum_aspect_ratio": 3,
},
}
If you also use the same calculator for ordinary patch-based vision models elsewhere, those product clients can own their own rows, or direct to a classifier, for example:
STANDARD_1536_PATCH_RULES = {
"maximum_patches": 1536,
"maximum_size": 2048,
"upscale_target": None,
"maximum_upscale": 1,
"maximum_aspect_ratio": None,
}
We have a function that anticipates the changes as parameters possibly reused or re-tuned again.
The calculator itself does not need to know model names. You can map gpt-5.2+ models, the re-defined “high” budget, or highest “original” to the “class” they belong to.
Self-contained token-count function
This function returns the image-token count before any model-specific pricing multiplier. For gpt-image-2 edits, this is the observed input image token count, no multiplier. The higher pricing of input image tokens itself vs text input gives an effective 1.6x multiplier.
def patch_image_token_count(
width: int,
height: int,
*,
maximum_patches: int = 1536,
maximum_size: int | None = None,
upscale_target: int | None = 1024,
maximum_upscale: int | float = 2,
maximum_aspect_ratio: int | None = 3,
) -> int:
"""Return the patch-count image tokens before pricing multipliers.
The defaults describe the observed gpt-image-2 edits image-input rules:
32 px patches, 1536 maximum patches, up to 2x small-image upscaling
toward a 1024 px longest side, and a 3:1 patch-canvas aspect limit.
Set upscale_target=None, maximum_upscale=1, and maximum_aspect_ratio=None
for ordinary patch-based vision behavior.
"""
from math import ceil, floor, sqrt
rules = {
"patch_size": 32,
"maximum_patches": maximum_patches,
"maximum_size": maximum_size,
"upscale_target": upscale_target,
"maximum_upscale": maximum_upscale,
"maximum_aspect_ratio": maximum_aspect_ratio,
}
def ceil_div(numerator: int, denominator: int) -> int:
return -(-numerator // denominator)
def ceil_unit(value: float) -> int:
return max(1, ceil(value - 1e-12))
def effective_size(source_width: int, source_height: int) -> tuple[int, int]:
target = rules["upscale_target"]
upscale = float(rules["maximum_upscale"])
if target is None or upscale <= 1:
return source_width, source_height
long_side = max(source_width, source_height)
scale = min(upscale, max(1.0, target / long_side))
return max(1, floor(source_width * scale)), max(
1,
floor(source_height * scale),
)
def patch_grid(pixel_width: int, pixel_height: int) -> tuple[int, int]:
patch_size = rules["patch_size"]
return (
ceil_div(pixel_width, patch_size),
ceil_div(pixel_height, patch_size),
)
def billing_canvas(
pixel_width: int,
pixel_height: int,
) -> tuple[int, int, int, int]:
patch_size = rules["patch_size"]
aspect = rules["maximum_aspect_ratio"]
patches_wide, patches_high = patch_grid(pixel_width, pixel_height)
canvas_width = pixel_width
canvas_height = pixel_height
if aspect is not None:
if patches_wide > aspect * patches_high:
patches_high = ceil_div(patches_wide, aspect)
canvas_width = patches_wide * patch_size
canvas_height = patches_high * patch_size
elif patches_high > aspect * patches_wide:
patches_wide = ceil_div(patches_high, aspect)
canvas_width = patches_wide * patch_size
canvas_height = patches_high * patch_size
return patches_wide, patches_high, canvas_width, canvas_height
def fit_scale(
canvas_width: int,
canvas_height: int,
patches_wide: int,
patches_high: int,
) -> float:
patch_size = rules["patch_size"]
patch_limit = rules["maximum_patches"]
size_limit = rules["maximum_size"]
scale = 1.0
if size_limit is not None:
scale = min(scale, size_limit / max(canvas_width, canvas_height))
if patches_wide * patches_high > patch_limit:
area_scale = sqrt(
(patch_size * patch_size * patch_limit)
/ (canvas_width * canvas_height)
)
scale = min(scale, area_scale, 1.0)
scaled_wide = canvas_width * scale / patch_size
scaled_high = canvas_height * scale / patch_size
if ceil_unit(scaled_wide) * ceil_unit(scaled_high) <= patch_limit:
return scale
return scale * min(
floor(scaled_wide) / scaled_wide,
floor(scaled_high) / scaled_high,
)
effective_width, effective_height = effective_size(width, height)
patches_wide, patches_high, canvas_width, canvas_height = billing_canvas(
effective_width,
effective_height,
)
patch_count = patches_wide * patches_high
size_exceeded = (
rules["maximum_size"] is not None
and max(canvas_width, canvas_height) > rules["maximum_size"]
)
if patch_count <= rules["maximum_patches"] and not size_exceeded:
return patch_count
scale = fit_scale(
canvas_width,
canvas_height,
patches_wide,
patches_high,
)
return (
ceil_unit(canvas_width * scale / rules["patch_size"])
* ceil_unit(canvas_height * scale / rules["patch_size"])
)
Usage
rules = IMAGE_INPUT_PATCH_RULES["gpt-image-2"] # model table lookup
tokens = patch_image_token_count(2048, 1024, **rules)
Or, with the gpt-image-2 edits defaults, we validate vs some actual usage discovered:
assert patch_image_token_count(1, 1) == 1
assert patch_image_token_count(16, 16) == 1
assert patch_image_token_count(17, 17) == 4
assert patch_image_token_count(256, 256) == 256
assert patch_image_token_count(384, 384) == 576
assert patch_image_token_count(512, 512) == 1024
assert patch_image_token_count(768, 768) == 1024
assert patch_image_token_count(1024, 1024) == 1024
assert patch_image_token_count(1536, 1536) == 1521
assert patch_image_token_count(2048, 2048) == 1521
assert patch_image_token_count(1023, 512) == 512
assert patch_image_token_count(1024, 512) == 512
assert patch_image_token_count(1025, 512) == 528
assert patch_image_token_count(1536, 512) == 768
assert patch_image_token_count(1537, 512) == 833
assert patch_image_token_count(2048, 512) == 1408
assert patch_image_token_count(2049, 512) == 1430
assert patch_image_token_count(2048, 1024) == 1458
assert patch_image_token_count(4096, 2048) == 1458
Full observed test list
For code port verification, for example
test_cases: list[tuple[tuple[int, int], int, str]] = [
((640, 512), 832, "512 < max < 1024 resize branch"),
((800, 600), 768, "mid-size non-square resize-to-512 branch"),
((1023, 768), 768, "last pixel before 1024 long-side branch"),
((1023, 1023), 1024, "square just before 1024 long-side branch"),
((1025, 768), 792, "first pixel after 1024 long-side branch"),
((1025, 1025), 1089, "square just after 1024 long-side branch"),
((1057, 1057), 1156, "half-scale rounding crosses 16px grid"),
((1216, 1216), 1444, "square just below budget knee"),
((1217, 1217), 1521, "square just above budget knee"),
((1280, 1280), 1521, "square budget shrink from 40x40 grid"),
((2048, 2048), 1521, "large square plateau check"),
((1569, 512), 850, "wide latent-aspect cap after rounding edge"),
((1633, 512), 936, "wide latent-aspect cap short-side ceil step"),
((1953, 512), 1302, "wide latent-aspect cap near 2048 region"),
((2304, 512), 1452, "aspect cap followed by budget shrink"),
((1664, 832), 1352, "2:1 below budget knee"),
((1696, 848), 1431, "2:1 near budget knee, still under"),
((1760, 880), 1458, "2:1 just over grid budget"),
((3072, 768), 1452, "large 4:1 aspect/budget order check"),
((3072, 1024), 1452, "large 3:1 scale-depth check"),
((1, 1), 1, "absolute tiny square; detects any hidden minimum upscale"),
((8, 8), 1, "below one 16px patch in both dimensions"),
((15, 15), 1, "last square before exact 16px patch boundary"),
((16, 16), 1, "exactly one 16px patch"),
((17, 17), 4, "first square requiring 2x2 latent patches"),
((31, 31), 4, "inside 2x2 latent patch bucket"),
((32, 32), 4, "exact 2x2 latent patch boundary"),
((33, 33), 9, "first square requiring 3x3 latent patches"),
((64, 64), 16, "small power-of-two square, 4x4 latent patches"),
((128, 128), 64, "larger small square, 8x8 latent patches"),
((255, 255), 256, "just below already-tested 256 square"),
((257, 257), 289, "just above already-tested 256 square"),
((16, 1), 1, "one-pixel short side, still one latent patch tall"),
((17, 1), 2, "one-pixel short side with 2x1 latent patches"),
((48, 1), 3, "exact 3:1 latent aspect ratio, no padding expected"),
((49, 1), 8, "first tiny case requiring 3:1 aspect padding"),
((97, 1), 21, "later tiny aspect-padding ceil step"),
((512, 1), 352, "largest no-resize one-pixel-high image"),
((513, 1), 352, "resize-to-512 branch with one-pixel short side"),
((1025, 1), 363, "half-scale branch and half-up rounding with tiny side"),
]
for (width, height), expected, description in test_cases:
actual = patch_image_token_count(width, height)
assert actual == expected, (
f"{width}x{height}: expected {expected}, got {actual}; {description}"
)
Self-contained recommended upload-size function
This second function is deliberately standalone. It does not call the token-count function above, so it can be copy-pasted by itself.
It returns dimensions that avoid predictable server-side downscaling under the same patch accounting rules. It never recommends upscaling the source, as that won’t save transmission bytes.
Aspect-ratio padding for the 3:1 / 1:3 canvas rule is treated as a billing-canvas behavior only. We don’t know if “padding” is true or seen. This function preserves the original image aspect ratio and does not add padding.
def recommended_patch_upload_size(
width: int,
height: int,
*,
maximum_patches: int = 1536,
maximum_size: int | None = None,
upscale_target: int | None = 1024,
maximum_upscale: int | float = 2,
maximum_aspect_ratio: int | None = 3,
) -> tuple[int, int]:
"""Return upload dimensions that avoid predictable server downscaling.
The defaults describe the observed gpt-image-2 edits image-input rules.
The returned size preserves the original image aspect ratio and never
upscales the source image.
"""
from math import ceil, floor, sqrt
rules = {
"patch_size": 32,
"maximum_patches": maximum_patches,
"maximum_size": maximum_size,
"upscale_target": upscale_target,
"maximum_upscale": maximum_upscale,
"maximum_aspect_ratio": maximum_aspect_ratio,
}
def ceil_div(numerator: int, denominator: int) -> int:
return -(-numerator // denominator)
def ceil_unit(value: float) -> int:
return max(1, ceil(value - 1e-12))
def round_half_up(value: float) -> int:
return max(1, floor(value + 0.5))
def effective_size(source_width: int, source_height: int) -> tuple[int, int]:
target = rules["upscale_target"]
upscale = float(rules["maximum_upscale"])
if target is None or upscale <= 1:
return source_width, source_height
long_side = max(source_width, source_height)
scale = min(upscale, max(1.0, target / long_side))
return max(1, floor(source_width * scale)), max(
1,
floor(source_height * scale),
)
def patch_grid(pixel_width: int, pixel_height: int) -> tuple[int, int]:
patch_size = rules["patch_size"]
return (
ceil_div(pixel_width, patch_size),
ceil_div(pixel_height, patch_size),
)
def billing_canvas(
pixel_width: int,
pixel_height: int,
) -> tuple[int, int, int, int]:
patch_size = rules["patch_size"]
aspect = rules["maximum_aspect_ratio"]
patches_wide, patches_high = patch_grid(pixel_width, pixel_height)
canvas_width = pixel_width
canvas_height = pixel_height
if aspect is not None:
if patches_wide > aspect * patches_high:
patches_high = ceil_div(patches_wide, aspect)
canvas_width = patches_wide * patch_size
canvas_height = patches_high * patch_size
elif patches_high > aspect * patches_wide:
patches_wide = ceil_div(patches_high, aspect)
canvas_width = patches_wide * patch_size
canvas_height = patches_high * patch_size
return patches_wide, patches_high, canvas_width, canvas_height
def fit_scale(
canvas_width: int,
canvas_height: int,
patches_wide: int,
patches_high: int,
) -> float:
patch_size = rules["patch_size"]
patch_limit = rules["maximum_patches"]
size_limit = rules["maximum_size"]
scale = 1.0
if size_limit is not None:
scale = min(scale, size_limit / max(canvas_width, canvas_height))
if patches_wide * patches_high > patch_limit:
area_scale = sqrt(
(patch_size * patch_size * patch_limit)
/ (canvas_width * canvas_height)
)
scale = min(scale, area_scale, 1.0)
scaled_wide = canvas_width * scale / patch_size
scaled_high = canvas_height * scale / patch_size
if ceil_unit(scaled_wide) * ceil_unit(scaled_high) <= patch_limit:
return scale
return scale * min(
floor(scaled_wide) / scaled_wide,
floor(scaled_high) / scaled_high,
)
effective_width, effective_height = effective_size(width, height)
patches_wide, patches_high, canvas_width, canvas_height = billing_canvas(
effective_width,
effective_height,
)
patch_count = patches_wide * patches_high
size_exceeded = (
rules["maximum_size"] is not None
and max(canvas_width, canvas_height) > rules["maximum_size"]
)
if patch_count <= rules["maximum_patches"] and not size_exceeded:
return width, height
scale = fit_scale(
canvas_width,
canvas_height,
patches_wide,
patches_high,
)
if scale >= 1:
return width, height
return round_half_up(width * scale), round_half_up(height * scale)
Upload-size examples
assert recommended_patch_upload_size(1536, 1536) == (1248, 1248)
assert recommended_patch_upload_size(2048, 2048) == (1248, 1248)
assert recommended_patch_upload_size(2048, 1024) == (1728, 864)
assert recommended_patch_upload_size(4096, 2048) == (1728, 864)
assert recommended_patch_upload_size(2304, 512) == (2112, 469)
How to use this in edits API client code
The model routing can stay in the API product layer:
IMAGE_INPUT_PATCH_RULES = {
"gpt-image-2": {
"maximum_patches": 1536,
"maximum_size": None,
"upscale_target": 1024,
"maximum_upscale": 2,
"maximum_aspect_ratio": 3,
},
}
model = "gpt-image-2"
rules = IMAGE_INPUT_PATCH_RULES[model]
image_tokens = patch_image_token_count(2048, 1024, **rules)
upload_width, upload_height = recommended_patch_upload_size(2048, 1024, **rules)
assert image_tokens == 1458
assert (upload_width, upload_height) == (1728, 864)
This keeps the pricing calculator generic and keeps product/model truth where it belongs: in the client’s model capability table, where you’d have token costs and more parameter gates. If you want to not anticipate the future, go ahead and unroll into a single gpt-image-2 function.
Minimal parameter truth table
These are the parameter classes needed by a generic reference function for coverage of patches caps and behaviors on all vision models seen, where gpt-image-2 should be included directly, not as a little addendum like gpt-image-1 (and with gpt-image-1.5 or mini also not mentioned in documentation.)
| Class | patch_budget |
max_dimension |
upscale_target |
max_upscale |
aspect_ratio_limit |
|---|---|---|---|---|---|
gpt-image-2 edits |
1536 |
None |
1024 |
2 |
3 |
| 1,536-patch standard vision models | 1536 |
2048 |
None |
1 |
None |
| 2,500-patch high-detail models | 2500 |
2048 |
None |
1 |
None |
| 10,000-patch original-detail models | 10000 |
6000 |
None |
1 |
None |
The gpt-image-2 row is the only one with the new behavior. In metacode terms, the new row is equivalent to adding code that accepts tweaks:
if model is gpt-image-2 edits:
upscale_target = 1024
max_upscale = 2
aspect_ratio_limit = 3
max_dimension = None
patch_budget = 1536
Everything else can continue to use the ordinary patch-budget calculator. Your edits API cost estimator is where this becomes valuable.
Now I’ve got apps to tune up for this “vision” pricing.
Happy calculating!