Pro x20 sub\ ($200) Subscription Silently Cancelled
To users with related issues: Has your problem been resolved? Resolved Unresolved Click to view the poll.
AI/ML news, top picks, and generated innovation digests.
193 articles tagged with this keyword, sorted by most recent first.
To users with related issues: Has your problem been resolved? Resolved Unresolved Click to view the poll.
@Prashant_Pardesi @OpenAI_Support hello any update about this issue
dad: It’s standard operating procedure across all businesses to disable access to services in the event of excessive outstanding bills/balances of all kinds–even if those charges are being disputed. It is also standard procedure that services like that are charged extra. Beside that .. there is a possibility to set warnings. https://platform.openai.com/settings https://platform.openai.com/settings/organization/limits
Thanks to everyone who shared case numbers here. We reviewed the newer reports and followed up when we were able to unblock. If you are still blocked, please continue working with Support in your existing case so we can review your account-specific situation.
Thanks everyone for sharing your case numbers and the additional details. For anyone seeing the option to reactivate a Business workspace: if the original workspace is greyed out, you may be prompted to create/purchase a new subscription for that same workspace. This can be expected as part of the reactivation flow. Important: when going through the Purchase subscription flow, make sure you select the same number of seats as the original deactivated workspace, or more . This is needed so the previous workspace members and their associated workspace data can be restored properly. If you select fewer seats than the original workspace had, only that smaller number of seats will be active after reactivation, and the remaining members may not get access to their previous workspace data. ~ Smith
Searching for AI Pulse information (click for more details)
OpenAI is releasing some sort of device related to its AI-powered coding tool, Codex, on July 15th. In a video posted to X on Monday, OpenAI shows a square-shaped device with several buttons, alongside the caption, "Your favorite Codex shortcuts are getting an upgrade." This isn't the mysterious AI-powered device OpenAI is working on with […]
As Anthropic forges a closer relationship with the state of California, the federal government has made an enemy out of the OpenAI rival.
Amazon engineers are already distilling Anthropic models into smaller, cheaper versions for internal use. Starting next year, Amazon will pay by tokens processed rather than compute hours, which could push costs up sharply. The company is also exploring alternatives like OpenAI. The article Amazon engineers are reportedly distilling Anthropic models to cut costs before new token-based pricing kicks in appeared first on The Decoder .
Austria's State Secretary for Digitalization, Alexander Pröll, is calling on the European Commission to explore bringing Anthropic to Europe. He's responding to the U.S. ban on advanced AI models from OpenAI and Anthropic for foreign users. The initiative is likely unrealistic, though. And the alternative floating in the background, Chinese AI models, would just trade one dependency for another. The article EU seeks AI independence as Austria proposes luring Anthropic to Europe appeared first on The Decoder .
@openai_support Any progress today? Add verification methods or support for modifying phone numbers
Meta is restricting its engineers' use of Anthropic's Claude and OpenAI's Codex to prevent output from these AI tools from being incorporated into its own training data. The article Meta restricts use of Claude Code and Codex to keep rival AI out of its training data appeared first on The Decoder .
A few days ago I shared a project I’ve been working on called “LatentGate” — a local-first pipeline that reduces LLM API token usage by processing inputs before sending them to the model. After some great feedback, I’ve now turned it into: A pip-installable Python package A VS Code extension (runs as a local proxy) MCP server support for tools like Claude Code, Cursor, Cline, Continue PyPI → pip install latent-gate VS Code → LatentGate — Local-First AI Compression What it does Images (~1000–1300 tokens) → compressed to ~150 tokens using local vision models (Ollama + LLaVA) Long prompts / conversations → compressed locally before hitting cloud APIs Works with OpenAI / Claude / Gemini APIs Fully local preprocessing (no data leaves your machine before compression) The idea is inspired by VL-JEPA — predicting in embedding space, then decoding selectively. Why I built this While experimenting with GPT-4o / vision APIs, I noticed most costs come from raw input size (especially images and long prompts). So instead of optimizing prompts endlessly, I tried: → “What if we reduce what we send in the first place?” What I’m looking for I’d love feedback from this community, especially: Edge cases where compression breaks context Cases where output quality drops noticeably Prompt / API compatibility issues (OpenAI especially) Performance bottlenecks Better approaches to selective decoding or compression If you try it and something fails — that’s honestly the most valuable thing for me rig…
Fun with API calls , as long as nobody is documenting gpt-image-2, nor noting overbilling reports nor fixes, seen above or elsewhere (such as on gpt-5.2 model vision): Send 23 input images to gpt-image-2 Why should I stop you? === 2026-06-29 05:42:54 | Images API request (edit) === (JSON-like approximation; actual call is http multipart/form-data) { "model": "gpt-image-2", "prompt": "Give the tall model the yellow baby doll dress seen in the other images", "size": "480x1408", "output_format": "png", "quality": "low", "background": "opaque", "n": 1, "image": [ "METADATA - filename: image.png; bytes: 933314; dimensions: 480x1408", "METADATA - filename: image2.png; bytes: 2400332; dimensions: 1536x1024", "METADATA - filename: image3.png; bytes: 2439315; dimensions: 1536x1024", "METADATA - filename: image4.png; bytes: 1688169; dimensions: 1536x1024", "METADATA - filename: image5.png; bytes: 2291162; dimensions: 1637x928", "METADATA - filename: image6.png; bytes: 2320081; dimensions: 1637x928", "METADATA - filename: image7.png; bytes: 2006693; dimensions: 1600x960", "METADATA - filename: image8.png; bytes: 815813; dimensions: 480x1408", "METADATA - filename: image9.png; bytes: 920722; dimensions: 480x1408", "METADATA - filename: image10.png; bytes: 1450837; dimensions: 1024x1024", "METADATA - filename: image11.png; bytes: 1694557; dimensions: 1024x1024", "METADATA - filename: image12.png; bytes: 935225; dimensions: 480x1408", "METADATA - filename: image13.png; bytes: 863611; dime…
Since the new ChatGPT 5.5 Instant model was released last week, we’ve seen issues with ChatGPT Instant not reliably completing App/MCP tool flows. Observed behavior: ChatGPT Instant either does not call any available tool, or calls only the first tool in the flow. After that single call, it stops and says it does not have access to the other tools, even though those tools are available. In our experiments, when ChatGPT Free usage falls back from Instant to ChatGPT Mini, the same tool flow starts working again. The flow also works as expected when using Thinking or Auto modes. This suggests the new Instant mode may not be invoking the follow-up tool calls required to complete a user request. Expected behavior: When a user asks ChatGPT to complete a task that requires tools, ChatGPT should continue calling the available tools as needed until the request is complete, rather than claiming it lacks access or stopping after the first tool call. Has anyone else observed this with ChatGPT 5.5 Instant and multi-step App/MCP tool flows? Happy to share reproduction details if helpful.
Germany has most jobs at risk while Luxembourg has largest share in occupations that may actually grow with AI, firm says in new report.
Hello, I am very stressed because my ChatGPT Business billing renewal is tomorrow, and I still do not have full certainty that the billing issue has been resolved. I have been trying to find every possible real-time contact option with OpenAI before the billing cutoff. I am the owner of a ChatGPT Business workspace. Today, before the renewal, I removed one member from my workspace. The Members page now correctly shows 2 active members , but the Billing page still shows 3 seats and the next invoice is still calculated for 3 users . I have already contacted OpenAI Support by email and attached screenshots showing the discrepancy. I received a case number, but the response appears to be AI-assisted, and I still need confirmation before tomorrow’s renewal. Could an OpenAI moderator or support representative please review this before the renewal is processed? I would like tomorrow’s invoice to be issued for 2 seats only , since the third member was removed before the billing date. Thank you.
Hey everyone, Thank you for raising the issue here. Since the issue is now resolved. We are closing this thread. Thank you!
OpenAI investigates Codex's usage limit depletion that's impacting some users. The company has reset user caps to address the problem.
@iamkishank Welcome to the forum! First, I do not use MCP myself, so please treat this as a helpful pointer rather than a confirmed diagnosis. I suspect that some of the MCP and authorization code may be shared across OpenAI tooling, including Codex. I mention Codex because it has a public GitHub repository with an active issues list . After having ChatGPT search the Codex issues, it identified this possibly related issue: github.com/openai/codex Custom STDIO MCP server enabled and tools/list works, but tools are not exposed in Codex Desktop thread opened 06:41PM - 05 Jun 26 UTC ilkerfatih44 bug windows-os mcp app ### What version of the Codex App are you using (From “About Codex” dialog)? Ve … rsion 26.602.40724 • Released 5 Haz 2026 ### What subscription do you have? Plus ### What platform is your computer? Microsoft Windows NT 10.0.26200.0 x64 ### What issue are you seeing? A custom STDIO MCP server is enabled in Codex Desktop and works correctly at the MCP protocol level, but its tools are not exposed to the active Codex Desktop thread. The MCP server appears enabled in Codex Desktop Settings → MCP servers. It also appears in `/mcp` as enabled. Local protocol probe succeeds: * initialize: OK * serverInfo: kuponcu-context-mcp v0.2.0 * tools/list returns 7 tools: * get_current_baseline * get_task_policy * get_forbidden_surfaces * get_validation_profile * search_project_sources * verify_hash_only * get_report_contract However, inside a Codex Desktop thread opened in the cor…
Welcome to the forum! Similiar request come up often. Below this post are Related topics If you are willing to switch to OpenAI Codex then you can make use of the /side (AKA /btw - By The Way) slash command. Use /side to start an ephemeral fork from the current conversation without switching away from the main task.
AI is revolutionizing all kinds of industries, but there are some things it still can't quite get right, and, by most accounts, design is one of them.
Hi @OpenAI_Support i have same issue. Pls help. Country: Vietnam Carrier: Mobifone Phone number: +84937***446
AI가 이미 현실이 된 만큼 근로자가 변화에 적응하도록 지원해야 한다는 것이 새롭게 출범한 비영리단체의 주장이다. 이는 AI가 일자리를 빼앗아 사회적 혼란을 초래할 것인지, 아니면 경제를 한 단계 더 도약시킬 것인지를 둘러싼 논쟁에 대한 이 단체의 해법이기도 하다. 6월 25일 출범한 레이즈 어스 (Raise Us)는 초당적 성격의 전국 비영리단체다. 미국 주지사와 기업, 근로자, 교육훈련 기관과 협력해 미국 노동자가 AI 경제로 성공적으로 전환할 수 있도록 지원하겠다고 밝혔다 . 이를 위해 근로자의 재교육과 직무 재배치를 촉진하는 새로운 기업 인센티브를 설계하고 시범 운영하는 한편, 직업 전환을 지원하는 새로운 방안을 마련하고 기업의 변화하는 인력 수요에 맞춘 교육 모델을 개발할 계획이다. 레이즈 어스는 현재까지 5억 달러(약 7,700억 원) 이상을 확보했으며, 자선재단과 업계로부터 다년간 총 10억 달러(약 1조 5,400억 원) 규모의 지원 약정을 확보하는 것을 목표로 하고 있다. 핵심 파트너로는 아마존, 마이크로소프트(MS), 앤트로픽, 오픈AI 재단(OpenAI Foundation)이 참여했으며, 시스코, IBM, ADP, AMD, 딜로이트를 포함한 20여 개 기관도 동참했다. 단체는 미국 로드아일랜드주 전 주지사 지나 러몬도(Gina Raimondo)가 최고경영자(CEO)와 공동 의장을, 인디애나주 전 주지사 에릭 홀컴(Eric Holcomb)이 공동 의장을 맡아 이끌고 있다. 초기 정부 협력 대상은 아칸소, 코네티컷, 메릴랜드, 유타 등 4개 주이며, 레이즈 어스는 이들 지역이 “성과 중심 시범사업의 첫 시험무대가 될 것”이라고 밝혔다. 하지만 업계 분석가들은 이번 프로젝트가 AI 업계를 위한 홍보 활동 이상의 의미를 갖는지에 대해 회의적인 시각을 보이고 있다. 시장조사업체 무어인사이트앤드스트래티지(Moor Insights & Strategy)의 수석 애널리스트인 제이슨 앤더슨 (Jason Andersen)은 레이즈 어스의 초기 발표 내용이 전략과 실행 계획보다는 마케팅과 보여주기에 가깝다고 평가했다. 앤더슨은 “AI 도입으로 이미 수천 명의 직원을 감축한 최고경영자(CEO)들이 다수 참여하고 있다는 점도 이 단체의 신뢰성을 높이는 데 도움이 되지 않는다”라고 지적했다. 또한 레이즈 어스가 미국 연방정부 대신 주정부와 협력하는 방식을 선택한 것은 의미가 있지만, 주 단위 접근 역시 지나치게 포괄적일 수 있다고 분석했다. 주마다 산업 구조가 크게 달라 일부는 지식 노동자 중심인 반면, 다른 지역은 제조업 비중이 높기 때문이라고 설명했다. ‘AI 워싱’이라는 비판 프리랜서 기술 분석가 카미 레비 는 레이즈 어스의 취지 자체는 높이 평가하면서도 실질적인 효과에는 의문을 제기했다. 레비는 “세련된 웹사이트와 낙관적인 홍보 문구에도 불구하고 레이즈 어스는 지금까지 등장한 가장 대표적인 AI 워싱(AI-washing) 사례일 가능성이 크다”라며 “불확실성을 해소하고 미래 노동자의 역량을 강화하기 위해 무언가가 이뤄지고 있다는 점을…
Ever since the forced update that compelled me to install the latest Codex build, I have noticed a massive, consistent downgrade in output quality. The drop-off between pre- and post-update performance is night and day. For the longest time, I relied exclusively on GPT 5.5 HIGH , and up until this update, the quality was phenomenal. After the update, it became completely unusable—hallucinating, outright lying, delivering substandard code, and serving up partial completions. Frankly, it started behaving exactly like the garbage Opus 4.7 release. I was scratching my head trying to figure out what went wrong, but now I have the answer: Codex is silently downgrading users to 5.4 and 5.4 Mini behind the scenes, and I have the proof. Inspecting the system calls post-update clearly confirms it is routing to 5.4 and 5.4 Mini. To say I am pissed off is an understatement. I deliberately avoided 5.4 in the past due to these exact quality issues and switched to Claude Code. When Opus 4.7 dropped and turned out to be trash, I migrated over to Codex, upgraded to a Pro subscription, and my productivity went through the roof.
My OpenAI account was suspended without any prior warning. After I submitted an appeal, the only response I received was that the original decision was upheld. However, I was not given any specific reason or explanation. This has been very difficult for me, because I do not know what went wrong or how I can correct it. I work as a Product Manager, and my account was mainly used for work-related purposes, especially generating HTML-based interactive product prototypes. These prototypes were used for product design, feature demonstrations, and internal requirement discussions. I did not use the account for abusive, automated, harmful, or policy-violating activities. Based on my actual usage, I believe this may have been caused by a misunderstanding or an automated risk flag. I fully understand that OpenAI needs to enforce safety and usage policies. However, I sincerely hope someone from the OpenAI team can help manually review my case again, or at least provide more information about why the suspension happened. Thank you.
Wow, I haven’t even been here for 2-3 months yet, but even I’m ranked 24th on the list. I like that. Edit: I just realized the ranking I was looking at was already the monthly ranking When looking at the yearly ranking, I’m 104th. All-time, I’m 422nd. Still not bad
Since the last addition of GPT 5.6 for testing, the image gen system has been having constant trouble rendering images in the chat. In my case, about 30% of the generations are never rendered in the chat, leading to the message “Connection interrupted. Waiting for the complete answer” or “Image generation failed, try again”. Thanks for your valuable work, OpenAI team. I am bringing this to your attention because it has been happening for about four days in a row, with no interruption.
It is not random. They are working on a sunday to keep things running. x.com Tibo @thsottiaux Codex team is in a warroom on a Sunday combing through logs and checking whether there is anything that could lead to increased usage drains for some users. Taking it very seriously and won't rest until we get to the bottom of it. x.com/nicdunz/status… nic @nicdunz i think openai has gotten a bit too loose and funny acting. i think its time to get a little serious 10:18 PM - 28 Jun 2026 2.3K 61 x.com Tibo @thsottiaux If you happened to have used your previous reset in the few hours before and didn't go through your usage, do not worry, you will get more manual resets after we conclude the investigation. 12:01 AM - 29 Jun 2026 290 7
China's Zhipu AI (Z.ai) released its open-weight GLM-5.2, and some researchers have claimed that it matches Mythos in certain bug-finding and cybersecurity scenarios. While GLM lags behind models from Anthropic and OpenAI in other, more general tasks, it seems that China has dramatically reduced the gap in the capabilities between its models and those of […]
I don’t want to have to keep creating new topics about this @OpenAI_Support please let me know of any updates, i still have not received any response in almost a month since it was escalated to a “Specialized Team” and I haven’t gotten any updates here. Is anyone still looking at my case or what is happening?
I wonder if this is related to the new version of GPT-5.5 Instant released last week. Can anyone from OpenAI confirm whether Apps on Instant have a smaller effective context or tool-descriptor budget? I saw docs implying context size for Instant is now 16K tokens (and it used to be 27K tokens). Specifically, can large MCP tools/list payloads - descriptions, input/output schemas, annotations, metadata, etc. - cause exposed tools to become unavailable or stop being selected after an initial tool call?
The timing on this couldn’t be better. I run agentic systems daily - OpenClaw, Hermes, Claude Code orchestrating multiple AI workers. The bottleneck has always been cost at scale. Anthropic’s API pricing makes it brutal to run agents 24/7. You’re watching credits evaporate in real time. The fact that OpenAI allows third-party harnesses to tap into these models through an existing subscription changes the math completely. Looking forward to Sol Ultra powering my agents without per-token anxiety. And “Ultra” mode with subagents working together - that’s exactly where agentic AI needs to go. Thank you for making this accessible to builders, not just enterprises with infinite API budgets. Time to put these through their paces. I’ve got 6 DGX Sparks running great local model like Gemma4 and these 5.6 models are going to run it all.
Request for Student Discount and Regional Pricing Subject: Request for Student Discount and Regional Pricing for ChatGPT Dear OpenAI Team, I hope this message finds you well. I would like to respectfully request that OpenAI consider introducing a Student Plan and regional pricing for countries where the current subscription cost is difficult for many students to afford. Many students rely on ChatGPT for: - Learning programming and software development - Research and academic writing - Completing educational projects - Learning new technologies and AI - Improving productivity and problem-solving skills However, the current subscription price can be a significant financial burden for students and users in developing countries. I kindly request that OpenAI consider: 1. A discounted Student Plan with verification through an educational institution. 2. Regional pricing based on local purchasing power. 3. Flexible monthly and annual plans at lower price points. 4. Additional educational benefits for verified students. Making ChatGPT more affordable would help many students gain access to high-quality AI tools for learning, innovation, and skill development. Thank you for your time and consideration. I appreciate the work OpenAI is doing and hope these suggestions can be considered in future updates. Sincerely, A Student and ChatGPT User
Body I am looking for guidance from OpenAI staff regarding two existing support cases. I have an active ChatGPT Plus subscription and have completed the standard troubleshooting multiple times (correct account, current app, supported country, tested across devices). Over the past several weeks I have experienced a pattern of issues affecting multiple features, including changing tool availability, intermittent usage limits, voice interruptions, inconsistent feature availability, and Agent not being available. I have now opened two support cases: Case 10583616 Case 10663155 Both were acknowledged and marked as escalated to a support specialist. However, I have not yet received an identifiable human response to either case. I’m not asking the community to troubleshoot my account. I’m asking whether an OpenAI staff member can advise whether these cases are still active, whether they can be reviewed by the appropriate team, or whether there is another process I should follow to have the account investigated. Thank you.
Body I am looking for guidance from OpenAI staff regarding two existing support cases. I have an active ChatGPT Plus subscription and have completed the standard troubleshooting multiple times (correct account, current app, supported country, tested across devices). Over the past several weeks I have experienced a pattern of issues affecting multiple features, including changing tool availability, intermittent usage limits, voice interruptions, inconsistent feature availability, and Agent not being available. I have now opened two support cases: Case 10583616 Case 10663155 Both were acknowledged and marked as escalated to a support specialist. However, I have not yet received an identifiable human response to either case. I’m not asking the community to troubleshoot my account. I’m asking whether an OpenAI staff member can advise whether these cases are still active, whether they can be reviewed by the appropriate team, or whether there is another process I should follow to have the account investigated. Thank you.
Thanks for sharing this, @ygchaudhary. This is a great idea, and a lot of what you described is actually starting to exist with Your Year with ChatGPT . The current recap already offers an optional year-end summary with personalized insights based on your conversations for eligible users, while using the same privacy controls as your ChatGPT history. ( help.openai.com ) Your suggestions go well beyond the current experience though. Things like AI identities, achievement badges, personalized artwork, learning timelines, richer project milestones, and more granular privacy controls would make it even more engaging. We'll also pass this feedback along to the team for logging. It's helpful to see detailed suggestions like this, especially around making the recap feel more meaningful and personalized over time. -Mark G.
Thanks for putting this together, @Oyla1972. This is a well thought out request, and the real world sports roster example does a great job of illustrating why an official ComfyUI integration could be valuable. Having ChatGPT assist with workflow design and troubleshooting, alongside Codex for generating helper scripts and automation, is an interesting use case. Your point about safe local file handling and avoiding frontend API key exposure is also an important consideration. We'll make sure this feature request is shared with the team and logged. While there's nothing to announce at the moment, detailed examples like yours help provide valuable context for potential future integrations. I'm also interested to hear from others in the community who are building ComfyUI extensions or have explored OpenAI API based integrations, especially approaches that prioritize secure API key handling. -Mark G.
PLUS: Mythos, General Intuition, Google AI Studio, and better AI benchmarks.
Everything is working fine, now. Thank you so much for your help. I did not expect this over the weekend. I sincerely, appreciate this. Truly awesome support!!! Richard
While we wait for a general release, the system card is the best hint as to what is going on with the new candidate for America’s Next Top Model, GPT-5.6. This is only an OpenAI model card, so by my standards it’s a light read. There’s a lot of things that you get in an Anthropic card, that are missing in an OpenAI card. Overall, the card gives a clear and consistent impression that GPT-5.6-Sol is a substantial improvement over GPT-5.5, but still short of Mythos. OpenAI calls it a ‘step function better’ than GPT-5.5. That seems accurate. OpenAI : Sol is our new flagship and a step function better than GPT-5.5. Terra delivers performance competitive to GPT-5.5 at 2x lower cost. Luna is our most cost-efficient model, delivering strong capability at our lowest cost. Together, the GPT-5.6 family gives people and developers more choice in how they balance intelligence, speed, and cost. Once available, pricing for GPT-5.6-Sol will be $5/$30, the same as GPT-5.5. Terra is $2.5/$15, Luna is $1/$6. They claim it will be on Cerebras at 750 TPS , which is insanely fast. Capacity will be limited, at least at first. They did not specify the price for that. There is a new higher thinking setting : Max. There is a new setting beyond Max called Ultra that lets GPT-5.6 spawn sub-agents. The intended strategy against bio and cyber misuse is defense-in-depth. My guess is that in practice this strategy is robust for now, but that the White House’s misunderstandings around Fable and what is and…
Dear OpenAI Team, My name is Emre Kedikli, and I am a ChatGPT Plus subscriber from Türkiye. First of all, I would like to sincerely thank you for creating one of the most influential AI platforms in the world. ChatGPT has become an important part of my daily learning, professional development, project planning, and research. I would like to share an idea that I believe could benefit millions of people worldwide. I propose the creation of an official OpenAI training, offering structured online training programs with certificates of completion and professional certifications. My suggestion includes: Fully online courses available worldwide Approximately 30 hours of learning for each program Interactive lessons and practical exercises Final assessment or examination Official digital certificates and professional certifications Verifiable digital badges for LinkedIn and professional profiles Example course titles: OpenAI – ChatGPT Fundamentals OpenAI – Prompt Engineering Fundamentals OpenAI – AI Productivity OpenAI – Generative AI Essentials OpenAI – Responsible AI OpenAI – AI for Manufacturing OpenAI – OpenAI API Fundamentals OpenAI – AI for Education OpenAI – AI for Business OpenAI – Digital Transformation with AI Example professional certifications: OpenAI Certified Prompt Engineer OpenAI Certified AI Professional OpenAI Certified Generative AI Specialist OpenAI Certified AI Developer To better illustrate this idea, I have also designed several concept certificate mockups tha…
Possibly, but I only opened one window, with only 1 agent.
This work was conducted during the GovAI Winter Fellowship 2026. Full report Executive Summary Frontier AI companies use offline monitoring to address risks from internally deployed AI agents. AI developers increasingly rely on AI agents for internal work, including for safety research and model training. At the same time, these companies are concerned that a misaligned model could exploit this access to take concerning actions, such as sabotaging efforts to understand the risks posed by AI. To identify such instances, AI companies have separate AI models called "monitors" that review transcripts of AI agents' actions and flag suspicious activity. Human reviewers examine activity flagged as suspicious by monitors, judge whether that activity is concerning, and decide on an appropriate response. This monitoring occurs offline, meaning that actions are reviewed after they have been executed rather than intercepted in real time. Companies currently assess the effectiveness of offline monitoring via synthetic attacks. To assess the effectiveness of offline monitoring, OpenAI and Anthropic use synthetic attacks – transcripts constructed to contain the kind of harmful actions a misaligned AI might take during deployment – and then check whether monitors flag them. Current reporting on assessments of effectiveness is insufficient. Given the information currently made public by Anthropic and OpenAI, external parties cannot assess the overall effectiveness of their offline monitoring…
Background ChatGPT has evolved from a conversational assistant into a tool that many users rely on for long-term projects, including software development, research, writing, game development, and creative work. Today, responses are generated primarily from: Shared long-term memory Current conversation history User-provided prompts and uploaded files This works well for general conversations, but becomes increasingly difficult for large, long-lived projects. The Current Problem Long-term memory is shared across all projects. When users switch between unrelated projects, memories from previous work may unintentionally influence responses. To avoid this, users must repeatedly search through their own documents, retrieve the relevant information, and paste it into every new conversation. Effectively, users become the retrieval layer in the RAG pipeline—acting as “organic RAG hardware.” The issue is not the context window size. The issue is that project knowledge exists, but ChatGPT cannot automatically retrieve it. Existing Precedent OpenAI has already demonstrated the value of project-aware retrieval through tools such as Codex. Instead of relying solely on conversation history, these tools understand an entire project by retrieving only the files relevant to the current task. This greatly improves long-term collaboration without requiring extremely large context windows. Proposed Solution Extend this idea beyond software development. Allow every ChatGPT project to own a dedica…
The part that stands out in current text-to-video workflows is how much time still goes into preparing visual references before a clip is generated. Seedream 5.0 pro looks relevant for teams comparing prompt-to-image and previsualization steps because it gives creators a faster way to draft images before moving into video or campaign production. I would usually test it on the product site first, then compare the output with the assets needed for a short launch page or storyboard: https://seedream50pro.com/
Paul Meade, the Apple vice president in charge of the Vision Pro headset, is reportedly leaving the company to join OpenAI’s hardware team.
OpenAI veröffentlicht GPT-5.6. Laut dem Hersteller übertreffen seine neuen KI-Modelle die Konkurrenz von Anthropic und verbrauchen weniger Token.
As tech firms make huge profits and investors fear losing out, both are doing their best to hold off the day of reckoning OpenAI staggers AI model release after White House request Every couple of decades, investors will ask themselves how long can the stock market keep climbing. Is it safe to buy more shares? Is their pension or equity portfolio vulnerable should financial markets, and especially those in the US, come crashing down to earth? When stock markets rise to historically high levels – and beyond the level when normal profits can sustain share prices – a few “experts” typically warn of an impending crash. Continue reading...
Former US Commerce Secretary Gina Raimondo has launched "Raise Us," a bipartisan nonprofit to prepare American workers for AI-driven job shifts. Amazon, Anthropic, Microsoft, and the OpenAI Foundation are jointly funding the initiative. That the very companies driving the disruption are bankrolling the response will likely raise questions about independence. The article The companies most likely to automate your job are now funding a $1 billion program to retrain you appeared first on The Decoder .
Die amerikanische Regierung bestimmt nun mit, wer Zugang zu neuen KI-Modellen bekommt. Ein Auslöser ist Angst vor Missbrauch für Cyberattacken.
Oddly tiered releases to both OAI and ANT on the same day.
The move comes the same day as a new OpenAI model sees a limited release.
This is a bad state of affairs. Consider, in particular, some industry dynamics: Frontier models are trained at an enormous cost, and a significant fraction of that cost is recouped in the few post-release months that they are broadly available. After that period elapses, the models become sub-frontier, competition emerges, and margins compress. Every week of delay is eating into the narrow window that labs have to make their accounting work. The ongoing AI infrastructure buildout—the one that is, according to former US AI Czar David Sacks, essential to the US economy , assumes a functionally global total addressable market for US AI services. No one is building $100 billion dollar data centers to serve frontier models to whatever 100 companies the US government will allow access. [...] — Dean W. Ball , 35 thoughts on what has happened and what America should do Tags: anthropic , generative-ai , openai , ai , llms
NYT shifts OpenAI/Microsoft copyright claims after SCOTUS ruling against Sony.
OpenAI's GPT-5.6 family adds tiered models with max and ultra reasoning. Here is what early-level engineers should know. The post OpenAI Previews GPT-5.6 With Sol, Terra, and Luna: Tiered Models, New Reasoning Modes, Limited Access appeared first on MarkTechPost .
“We don’t believe this kind of government access process should become the long-term default,” says OpenAI. “It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them.”
The hire marks OpenAI's latest push into India, expanding offices, partnerships and hiring.
Nvidia has dominated the AI chip market for years, but the era of total dependence might be ending. OpenAI just shared its plans to spice things up with Jalapeño, its custom inference chip built with Broadcom, joining Google, Apple, and SpaceX in a growing list of companies building their way out of single-supplier risk. The goal is less of a […]
OpenAI CEO Sam Altman saw from the very beginning that compute infrastructure would become an important battlefront, and his company is getting ahead in a key element that can make models do more, faster and more cheaply.
We're beginning a limited preview of the GPT‑5.6 series: Sol, our flagship model; Terra, a balanced model for everyday work; and Luna, a fast and affordable model. Terra has competitive performance to GPT‑5.5 while being 2x cheaper and Luna brings strong capability at our lowest cost. [...] We believe in broad access, and we plan to make GPT‑5.6 Sol, Terra, and Luna generally available in the coming weeks. As part of our ongoing engagement with the U.S. government, we previewed our plans and the models’ capabilities ahead of today’s launch. At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly. [...] GPT‑5.6 is priced per 1M tokens across three model sizes: Sol is $5 input / $30 output; Terra is $2.50 input / $15 output; and Luna is $1 input / $6 output. GPT‑5.6 also introduces more predictable prompt caching, including support for explicit cache breakpoints and a 30-minute minimum cache life. For GPT‑5.6 and later models, cache writes are billed at 1.25x the model’s uncached input rate, while cache reads continue to receive the 90% cached-input discount. — OpenAI , Previewing GPT‑5.6 Sol: a next-generation model Tags: gpt , generative-ai , ai-security-research , openai , llms , llm-release , llm-pricing
Less than 24 hours after news broke that OpenAI would stagger its next model release at the request of the Trump administration, that model, GPT-5.6, is here. On Friday, the company unveiled the limited preview of its new GPT 5.6 model suite: Sol, the flagship; Terra, a medium-tier model for "high-volume work"; and Luna, a […]
Using Gemma 4, Ollama, OpenAI Agents SDK, and Tavily MCP to build a lightweight research agent The post From Local LLM to Tool-Using Agent appeared first on Towards Data Science .
AI models have progressed to the point where their capabilities have real political consequences. Dealing with those consequences will require collective action.
US authorities are getting decidedly twitchy about frontier AI models. Just a couple of weeks after ordering Anthropic to prevent foreign companies from getting hold of its latest release, Mythos/Fable 5, it’s been putting the squeeze on another AI company.. Now, the Trump administration is asking OpenAI to hold back on the general release of GPT-5.6, according to a report from Bloomberg . OpenAI CEO Sam Altman reportedly told employees that the government is asking that the model be released only to a short list of trusted partners, initially 20, before being more widely disseminated. Altman reportedly told staffers that the administration was getting nervous about the capabilities of the latest AI tools. It didn’t go as far as forbidding access to foreign users but it’s clear that the White House is looking to act as the power of the new models becomes more apparent. The administration’s actions will undoubtedly cause some anxiety among AI companies, particularly in light of OpenAI’s and Anthropic’s upcoming IPOs. There will be concerns that new software developments could be postponed or even halted. However, it should also be noted that the administration was already displeased with Anthropic over its moral stance on defense issues, so the action against Mythos should be placed in context. Indeed, the government is trying to play down such fears. Bloomberg quoted a White House official as saying that the Trump administration continues to collaborate with frontier AI labs…
Sam Altman announces limited preview of GPT 5.6 in move that echoes launch of Anthropic’s Mythos Business live – latest updates OpenAI is staggering the release of its latest AI model after a request from the US government, in a move echoing the launch of Anthropic’s Mythos product. The company behind ChatGPT signalled its dissatisfaction with the move, saying that doing so keeps the best AI tools from “users, developers, enterprises, cyber defenders, and global partners who need them”. Continue reading...
While the White House asked OpenAI to limit the GPT 5.6 release, the company revealed plans to make it available only to a small group of partners with the govt approving access "customer by customer." The post US reportedly wants OpenAI to delay GPT-5.6 rollout after restrictions on Anthropic’s models appeared first on MEDIANAMA .
Uber India and South Asia President Prabhjeet Singh has stepped down after spending nearly 11 years with the ride hailing company. Confirming the development, an Uber spokesperson said, "India is one of Uber's most important markets globally, an important driver of innovation and long term growth. The strength of our business today reflects the incredible team and foundation built over the years. We thank Prabhjeet for his leadership and lasting contributions in his decade long journey with Uber. We remain deeply committed to our next phase of growth in India." Media reports suggest that Singh is set to join OpenAI as India MD. Singh joined Uber in August 2015 after a stint at McKinsey & Company, where he was an Associate Partner. Over the years, he held multiple leadership roles before being appointed President of India and South Asia in July 2020. As President, Singh led Uber's mobility operations across India, Bangladesh and Sri Lanka. Under his leadership, the company diversified beyond four wheeler ride hailing by expanding offerings such as Auto, Moto and Shuttle, while strengthening collaborations with public transport authorities and digital public infrastructure initiatives. Singh's departure comes days after Uber CEO Dara Khosrowshahi visited India and announced the company's first data centre in the country in partnership with Adani Group, signalling a deeper investment in its technology and infrastructure footprint. In February, Uber also infused nearly Rs 3,000…
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Heat waves mess with your brain. Scientists are trying to figure out why. —Jessica Hamzelou It’s been hot in London this week. Really hot. A dangerous heat wave has hit Western…
The move to restrict access represents Washington’s latest AI intervention.
Découvrez comment Verso développe une entreprise véritablement AI-native. Lors de cette session enregistrée lors de l'événement OpenAI France en juin 2026, Lydia Bellahouel, cofondatrice et CEO de Verso, partage la manière dont son équipe s'appuie sur les modèles OpenAI, Codex et des systèmes multi-agents pour automatiser ses opérations, accélérer la recherche consommateur et faire évoluer l'entreprise avec une structure particulièrement légère. Lydia explique comment fonctionne le « Verso Brain », un système conçu pour écouter, raisonner et agir de manière autonome, ainsi que les choix techniques et organisationnels qui permettent à Verso de livrer des études consommateurs jusqu'à dix fois plus rapidement et à moitié prix par rapport aux approches traditionnelles.
AI is here, and we must help workers adapt: That’s the response of a new non-profit organization to the ongoing debate about whether AI will destroy jobs and cause catastrophe or take the economy to new heights. Launched Thursday, Raise Us is a nonpartisan national organization that says it will partner with state governors, employers, workers, and training organizations to help the US workforce make a successful transition to an AI economy. It says it will design and pilot new corporate incentives to retrain and redeploy workers, develop new approaches to support people through job transitions, and create new training models tied to changing employer demand. It has already raised more than $500 million and hopes to bring that up to $1 billion in multi-year commitments from philanthropists and industry sources. Anchor partners include Amazon, Microsoft, Anthropic, and the OpenAI Foundation, and more than two dozen other organizations including Cisco, IBM, ADP, AMD, and Deloitte have also signed on. The organization is led by two former US state governors, Gina Raimondo, who will serve as CEO and co-chair, and Eric Holcomb, co-chair, and its initial government partnerships are with the states of Arkansas, Connecticut, Maryland, and Utah, which, it said, “will serve as the first proving grounds for outcome-driven pilots.” However, analysts are skeptical that the initiative is anything other than a public-relations effort for the AI industry. Jason Andersen , VP and principal a…
PLUS: OpenAI delays IPO, Google's agents can now click your screen, and $500M to retrain workers AI displaces.
OpenAI previews GPT-5.6 Sol, a next-generation model with stronger capabilities in coding, science, and cybersecurity, paired with its most advanced safety stack.
Note on independence: This evaluation was conducted under a standard NDA. Due to the sensitive information shared with METR as part of this evaluation, OpenAI’s comms and legal team required review and approval of this post. 1 Summary We conducted an independent external evaluation of GPT-5.6 Sol. For this evaluation, OpenAI provided: Access to GPT-5.6 Sol, both the final checkpoint and a ‘railfree’ version, via API Access to GPT-5.6 Sol with raw chain-of-thought via API A “Codex harness setup guide for third-party assessors” Updated answers to key claims from our pilot Frontier Risk Report questionnaire We initiated an evaluation of GPT-5.6 Sol on our Time Horizon 1.1 suite of software tasks. However, the resulting measurement depends heavily on our detection and treatment of cheating attempts by the model, and GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated on our ReAct agent harness. For our task suite, we define “cheating” as behavior where the model improves evaluation performance by exploiting bugs in the evaluation environment or by adopting strategies disallowed by the task, rather than solving the task within the expected evaluation constraints. Some examples we saw when evaluating GPT-5.6 Sol included the model packaging exploits in its intermediate submissions to reveal information about a task’s hidden test suite and, in another task, extracting hidden source code detailing the expected answer. In addition to a model’s own…
A new copyright lawsuit claims OpenAI copied millions of newspaper articles into AI training datasets while stripping copyright notices and author information before training GPT models. The post Why 35 US news publishers are suing OpenAI and Microsoft appeared first on MEDIANAMA .
JiviAI, an AI healthcare startup founded by former BharatPe Chief Product Officer Ankur Jain, has shut down operations, according to multiple sources familiar with the matter. The development comes less than two years after the startup entered the crowded generative AI healthcare space. The company had bet on proprietary AI models to deliver medical assistance and healthcare related services. The startup also raised an undisclosed funding in late 2024. According to sources, the shutdown came amid rising infrastructure costs, funding challenges, and failed acquisition discussions. “Building and running proprietary AI models became increasingly expensive. When you’re up against companies like OpenAI and Google, it becomes very difficult to make the economics work,” said a person familiar with the matter, requesting anonymity. According to another source, investors who had initially shown interest in backing the company did not participate in its planned funding round, putting additional pressure on its finances. “There were a few acquisition discussions as well, but none of them materialised. Once those fell through, the company had very few options left,” the person said. Sources said employees have been informed about the shutdown and have been asked to leave as the company winds down operations. Industry sources also suggest that Jain is evaluating his next move. Some industry observers have speculated about a possible return to BharatPe following the recent departure of Gr…
The Trump administration, apprehensive of potential security issues, has reportedly asked OpenAI to stagger the release of its next big-ticket model, GPT-5.6. The Information reported that OpenAI CEO Sam Altman told employees Wednesday in a company Q&A that it would release GPT-5.6 in limited preview form - granting access only to a small group of […]
Now our plants won't shut up. Give your plants a voice: https://github.com/openai/planttalk
PLUS: Noam Shazeer leaves Google for OpenAI
OpenAI’s financial trajectory hinges heavily on infrastructure costs, a reality that drove the development of the new custom OpenAI Jalapeño chip. Developed in collaboration with Broadcom, the application-specific integrated circuit (ASIC) represents a direct attempt to mitigate the heavy capital expenditure associated with third-party hardware. While Nvidia currently commands an estimated 75% profit margin on […] The post The math behind the OpenAI Jalapeño chip appeared first on AI News .
A new OpenAI research paper shows how AI agents are transforming work, enabling longer, more complex tasks and expanding productivity across roles.
OpenAI Group PBC today revealed a custom chip called Jalapeño that it will use to power its large language models. The processor is the fruit of a collaboration with Broadcom Inc., which is no stranger to custom silicon design. The company helped Google LLC develop its TPU line of artificial intelligence accelerators. In April, the […] The post OpenAI, Broadcom debut custom Jalapeño chip for AI inference appeared first on SiliconANGLE .
The common cold comes for us all—often more than once a year. And there is no way to prevent it. The best you can do is take vitamin C and stay away from people with the sniffles. Now the payment company Stripe, founded by brothers Patrick and John Collison, says it will fund a new…
Samsung Electronics is expanding employee access to ChatGPT Enterprise and Codex, giving staff wider use of AI tools for technical and non-technical work. According to OpenAI, the deployment covers all Samsung Electronics employees in Korea and all Device eXperience employees worldwide. The DX division includes smartphones, consumer electronics, and home appliances. Samsung plans to use […] The post Samsung opens ChatGPT Enterprise and Codex access after AI restrictions appeared first on AI News .
OpenAI and Broadcom introduce Jalapeño, a custom AI chip built for LLM inference to improve performance, efficiency, and scale across AI systems.
Omio integrates OpenAI models across its engineering operations to accelerate travel product development and launch booking interfaces. The multimodal travel platform coordinates operations with over 3,000 transportation providers across 47 countries. Omio explicitly rejects the superficial addition of technology to outdated internal processes. The company’s CTO, Tomas Vocetka, requires all internal functions to completely redesign […] The post Omio scales travel product development using OpenAI models appeared first on AI News .
OpenAI helps build shared standards for advanced AI, supporting evaluation frameworks, safety practices, and global cooperation through the Appia Foundation.
What happens when one of the world's leading travel platforms combines real-time transportation data with AI? In this customer story, Omio shares how it's using ChatGPT, Codex, and the OpenAI API to reimagine how travelers discover and book journeys, while transforming how teams build and operate across the business. Hear from Tomas Vocetka, CTO at Omio, as he discusses the shift from search-based experiences to conversational travel, the company's journey to becoming AI-native, and how AI is helping accelerate product development, experimentation, and innovation at scale. Read the full story: www.openai.com/customer-stories/omio
OpenAI has launched a program with cybersecurity firm Trail of Bits to use AI to find and fix vulnerabilities in widely used open-source software, as enterprises face growing risks from flaws buried deep in their software supply chains. The initiative, called Patch the Planet , uses AI-assisted vulnerability research alongside human review to help turn security findings into tested fixes that can be disclosed through existing project channels. Initial participants include Python, Go, cURL, Sigstore, NATS Server, aiohttp, freenginx, pyca/cryptography, and python.org. These projects support software development, networking, cryptography, and supply chain infrastructure used across a wide range of enterprise applications and services. OpenAI said each engagement will begin with consultation with maintainers to identify where security support is most needed. Researchers will then investigate potential vulnerabilities, validate meaningful issues, develop or refine patches, support testing, and coordinate disclosure through the project’s existing channels. Participating security researchers will use the company’s models and Codex Security to analyze code and help move fixes toward release. Trail of Bits engineers will review findings before they are sent to maintainers, a step meant to filter out false positives and duplicate reports before they add to the workload of open-source projects. The company is also working with HackerOne and Calif to support vulnerability triage, coordi…
In the past year, the enterprise AI ecosystem has gained enormous capability and zero consensus. Developers now have a remarkable set of tools for building AI agents: OpenAI’s frameworks, Anthropic’s Claude tooling, LangChain, LangGraph, CrewAI, Microsoft AutoGen, and a growing list of alternatives. Each promises to coordinate reasoning loops, manage multi-step task execution, and connect agents to tools and APIs. For experimentation, the progress has been substantial. Teams can now assemble sophisticated agent workflows in days that would have taken months two years ago. But I’ve watched this pattern before. In over two decades of building and selling distributed systems platforms, I’ve seen the same dynamic play out across nearly every major infrastructure shift: the tools for consuming a new capability arrive before the infrastructure for governing it does. The gap that emerges isn’t immediately obvious in development environments. It becomes obvious in production. That’s exactly where enterprise AI stands today. What agent frameworks don’t handle Modern agent frameworks are fundamentally coordination systems. They determine what a system should do: which tools to call, how to sequence tasks, how to delegate work across agents. That’s hard work, and they’ve gotten quite good at it. What they rarely address is where those tasks are allowed to run, and under what conditions. Take a seemingly simple workflow: summarize customer support transcripts using an LLM. In a developm…
Discover how Omio uses OpenAI to power conversational travel experiences, accelerate product development, and transform into an AI-native company.
OpenAI boardmember Zico Kolter and Gray Swan CEO Matt Fredrikson join swyx to explain why AI security is not just “cybersecurity with AI”
OpenAI introduces new Daybreak tools, including Codex Security and GPT-5.5-Cyber, to help organizations find, validate, and patch vulnerabilities at scale.
OpenAI introduces Patch the Planet, a Daybreak initiative helping open-source maintainers find, validate, and fix vulnerabilities with AI and expert review.
L’Oréal has announced a collaboration with OpenAI that will bring Maybelline New York’s virtual makeup try-on feature into ChatGPT. The announcement was made at VivaTech 2026. The partnership covers consumer-facing shopping tools, product discovery, advertising pilots, research, and internal content production. The collaboration also covers L’Oréal’s internal use of AI in research, formulation, content production, […] The post L’Oréal brings Maybelline virtual try-on to ChatGPT appeared first on AI News .
Samsung Electronics deploys ChatGPT Enterprise and Codex to employees worldwide, marking one of OpenAI’s largest enterprise AI rollouts.
PLUS: Claude robotics, Dean Ball at OpenAI, DeepSeek's raise, and sovereign models.
Appearing at the Big Technology AI Summit Thursday, OpenAI president Greg Brockman indicated that winning the compute race could be the key to winning more broadly.
PLUS: A reasoning model helped doctors surface 18 confirmed rare-disease diagnoses.
Health is one of the most meaningful ways people use ChatGPT. Every week, more than 230 million people turn to ChatGPT for help with health and wellness questions: making sense of health information, understanding lab results, preparing for appointments, navigating insurance, building healthier habits, and figuring out what to ask next. With GPT‑5.5 Instant, we’re seeing a substantial step forward in health, with improvements in recognizing when urgent care may be needed, asking for relevant context, explaining uncertainty, and making complex information easier to understand. On our most challenging health evaluations, GPT‑5.5 Instant now performs at a level comparable to our frontier Thinking models. Because it is available to all free users in ChatGPT, more people can benefit from these improvements. That progress reflects both advances in model capabilities and the physician-led work behind our health evaluations. Across our efforts, a global network of physicians helps define what “good” looks like in real-world health situations by reviewing example model responses, describing ideal behavior, and identifying failure modes. Working with physicians gives us a way to measure progress in health and improve how ChatGPT responds over time. Learn more: https://openai.com/index/improving-health-intelligence-in-chatgpt/
OpenAI introduces new spend controls and usage analytics for ChatGPT Enterprise, helping organizations manage costs and scale AI with confidence.
Researchers used an OpenAI reasoning model to help diagnose rare diseases, identifying 18 new diagnoses in previously unsolved cases.
Anthropic and OpenAI both shipped 'dreaming' for AI memory in May and June 2026, and they built opposite architectures. A look at what each lab shipped, what the empirical literature says, and what to do if you are building memory for your own agent. The post Two labs started dreaming, and they built two different architectures appeared first on Arize AI .
OpenAI and Anthropic are going public while still capturing much of the money spent on foundation-model usage. But deployment patterns are starting to tell a more complicated story. Companies are building hybrid model portfolios, using proprietary models where convenience, support, and frontier capability matter, while turning to open-weights models where cost, privacy, customization, and deployment Continue reading "The Hybrid AI Stack Is Coming for the Pricing Power of OpenAI and Anthropic" The post The Hybrid AI Stack Is Coming for the Pricing Power of OpenAI and Anthropic appeared first on Gradient Flow .
Having the right certificate can make all the difference. But with so many out there, getting the right one isn’t easy. That’s where OpenAI Academy comes in. OpenAI, the company behind the ChatGPT models, has introduced a learning platform through its OpenAI academy that offers AI courses for upskilling professionals. These courses cover topics like […] The post OpenAI Just Launched 3 Free AI Courses with Certificates appeared first on Analytics Vidhya .
OpenAI and Molecule.one show how a near-autonomous AI chemist using GPT-5.4 improved a key drug-making reaction, advancing medicinal chemistry research.
The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next. Chapters 00:00:24 Growing up at OpenAI 00:03:10 Why reasoning changed everything 00:06:28 What made o1 surprising 00:11:20 Why old benchmarks stopped working 00:14:45 What makes a good benchmark 00:17:35 Why evals are getting harder 00:22:09 Measuring voice and vision models 00:24:48 Testing models on real science 00:33:23 How OpenAI tracks frontier progress 00:40:47 What AI means for work
Subscribe • Previous Issues The Hybrid AI Stack Is Coming for the Pricing Power of OpenAI and Anthropic OpenAI and Anthropic are going public while still capturing much of the money spent on foundation-model usage. But deployment patterns are starting to tell a more complicated story. Companies are building hybrid model portfolios, using proprietary models where convenience, Continue reading "Your AI bill is a tax on scale" The post Your AI bill is a tax on scale appeared first on Gradient Flow .
OpenAI introduces Deployment Simulation, a method to predict AI model behavior before deployment using real conversation data to improve safety and evaluation accuracy.
PLUS: Anthropic raced to DC to save its banned AI
The US government yanked Anthropic's newest models days after launch, while state attorneys general opened formal process against OpenAI. That turns frontier capability into something investors have to discount: a model can be state-of-the-art on Monday and policy-frozen by Friday. The market still wants the upside, but the asset now has a kill-switch.
OpenAI launches the Partner Network, investing $150M to help global partners accelerate enterprise AI adoption, deployment, and transformation.
Could you provide a link to the Wall Street Journal article where this information comes from please? I would like to read the whole article.
OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.
Preply uses OpenAI to launch AI-generated lesson summaries, providing personalised feedback and language learning exercises.
PLUS: OpenAI's planning a price war with Anthropic ahead of both IPOs.
Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide.
OpenAI ’s fourth large language model (LLM), GPT-4 , took an estimated 50 gigawatt-hours to train, or the equivalent of 5,000 American homes ’ yearly power consumption. That was in 2023. Since then, the computational resources used to train frontier LLMs have only increased , though direct power usage numbers are hard to come by. Now, a research group at the University of Twente in the Netherlands has shown that you can save up to 14 percent of the energy used in LLM training without sacrificing speed by cleverly adjusting the clock frequency of the GPU during computation. Jeffrey Spaan , Ph.D. candidate at University of Twente and lead author on the article, presented the results at the Computing Frontiers conference in Catania, Sicily, last month. “My research is about finding computing waste,” Spaan says. “It’s similar to underutilization of the hardware, but instead of optimizing the software for the hardware, we try to optimize the hardware for the software.” Making the GPU tick Spaan and his collaborators accomplished this by using a technique known as dynamic voltage and frequency scaling ( DVFS ). Every chip—including the GPUs commonly used for training frontier models—uses at least one clock to orchestrate computations. Each operation in the chip is triggered by a clock pulse. The frequency with which that clock ticks controls how fast the chip operates and how much power it draws. Modern GPUs have two clocks, one for the computational core and one for the memory. W…
SpaceX won’t get easy access to billions of dollars from passive investors.
Microsoft used its own developer conference to show it can live without OpenAI, Florida's attorney general sued OpenAI and went after Sam Altman personally, researchers and a new Workday product made plain that nobody trusts AI agents yet, and Alphabet raised a record $85 billion the same week the Fed flagged AI as a systemic risk. The money is moving faster than the trust.
China’s AI competition strategy: Wide dispersion, cheap tokens Linda_Heyer Wed, 06/03/2026 - 10:01 picture alliance / Bildagentur-online | Tetra Images-Erik Isakson Comment Jun 03, 2026 2 min read China’s AI competition strategy: Wide dispersion, cheap tokens China’s flagship AI company DeepSeek released its V4 model in April, with a promotional price that puts it at a mere fraction of the cost of its North American competitors’ models. This reflects a wider trend in China’s AI sector: Instead of competing directly with companies like OpenAI, Anthropic and Google, who offer state of the art services at a premium, Chinese companies are pursuing a strategy of wide diffusion and cheap tokens to gain market share across the world. For Europe, this may pose the risk of forming a quick dependency on Chinese models as the basis for AI development, plus European talent being funneled to enhance Chinese systems. Many Chinese AI companies have followed the DeepSeek model. They are building models that are decent, but not cutting-edge, in performance and instead are focused on high compute efficiency that lowers costs for users. They have also made their models available via open-source platforms, meaning anyone can use, fine-tune and host them for free, as opposed to proprietary models like current Western leaders. Downloads of Chinese models on open-source platform Hugging Face have surpassed US models since late 2025. Of the top ten open-weight models by performance, the top seven a…
Here’s why Anthropic and OpenAI are on board with Illinois safety testing.
Elon Musk Loses $150 Billion Suit Against OpenAI and Sam Altman, Google updates its Gemini app to take on ChatGPT and Claude at IO 2026, and more!
Google unveils AI model Gemini 3.5 and AI agent Gemini Spark, Omni turns images, audio, and text into video, Musk loses OpenAI court battle
For nine years the AI boom has been a private bet, priced by a small circle of venture funds and sovereign wealth in rounds most people could never touch. This week it started going public. SpaceX filed an $80 billion IPO prospectus on Wednesday, the largest in history, with a chatbot company and $6.4 billion in AI losses folded inside it. OpenAI is days from filing its own, aiming for a trillion-dollar debut by September. The public markets are about to answer the question private investors kept waving away: at what price?
OpenAI launches new voice intelligence features in its API, Thinking Machines drops a new, highly responsive model designed for humanlike interactions in real time, and more!
Assessment Window: Feb 16, 2026 – Mar 16, 2026 Download PDF Redaction summary statement: Except where explicitly noted in the report, there was no additional redacted information that was important to our conclusions from any of the participating companies. Executive summary and guide to the report Starting in February 2026, METR conducted a pilot exercise to assess misalignment risks from AI agents used inside frontier AI developers, with participation from Anthropic, Google, Meta, and OpenAI. We make three main contributions in this report, each detailed in a separate section. First, we motivate and outline the process we followed for this exercise. 1 Each participant provided: Access to their most capable internal model(s) at the time of assessment, including raw chains of thought. A wide range of non-public information about the capabilities of the shared model(s), how AI was used and monitored internally, and trends in the pace of progress. METR then prepared private reports for each participant, participants approved what non-public information could be disclosed, and METR wrote this public report. This exercise is entity-based rather than model-specific, and is designed to be repeated periodically rather than tied to public releases. Second, we present six key facts that inform our assessment, drawing on evaluations we conducted on the models that participants shared, 2 evaluations we conducted on public models, information shared by participants, 3 findings from a re…
Periodo de evaluación: 16 de febrero de 2026 - 16 de marzo de 2026 Descargar PDF (inglés) Declaración resumida sobre omisiones: Salvo donde se indique explícitamente en el informe, no hubo información omitida adicional de ninguna empresa participante que fuera importante para nuestras conclusiones. Resumen ejecutivo y guía del informe En febrero de 2026, METR inició un ejercicio piloto para evaluar los riesgos de desalineación de los agentes de IA usados dentro de empresas desarrolladoras de IA de frontera, con la participación de Anthropic, Google, Meta y OpenAI. En este informe hacemos tres contribuciones principales, cada una detallada en una sección separada. Primero, motivamos y describimos el proceso que seguimos para este ejercicio. 1 Cada participante proporcionó: Acceso a su(s) modelo(s) interno(s) más capaz/capaces en el momento de la evaluación, incluidas cadenas de pensamiento sin procesar. Una amplia variedad de información no pública sobre las capacidades del/de los modelo(s) compartido(s), cómo se usaba y monitoreaba la IA internamente, y las tendencias en el ritmo de progreso. Luego, METR preparó informes privados para cada participante, los participantes aprobaron qué información no pública podía divulgarse, y METR escribió este informe público. Este ejercicio está basado en entidades en lugar de ser específico para un modelo, y está diseñado para repetirse periódicamente en lugar de estar ligado a lanzamientos públicos. Segundo, presentamos seis hechos clav…
评估时段: 2026 年 2 月 16 日至 3 月 16 日 下载 PDF(英文) 关于删节: 除报告中明确标注之处外,参评公司没有删去任何会影响我们结论的重要信息。 报告摘要与导读 2026 年 2 月,METR 开展了一项试点研究,对前沿 AI 公司在实验室内部使用 AI 智能体时可能出现的不对齐风险进行了评估。Anthropic、Google、Meta 和 OpenAI 参与了这项试点。报告正文分为三部分: 第一部分中 ,我们 阐述了这项试点研究的设计动机和具体流程 1 。参与研究的各家公司向 METR 提供了: 评估期间使用其最强内部模型的权限,包括原始思维链。 大量非公开的信息,包括所共享模型的能力、公司内部使用和监测 AI 的方式,以及模型能力进展速度的趋势。 随后,METR 为每家参与公司撰写了一份内部报告。在与各公司确认内部报告中可公开的信息后,METR 撰写了这份报告。本次试点研究以 公司实体 为评估对象,而非局限于具体模型。评估将周期性进行,不与 AI 模型公开发布的时间挂钩。 第二部分中 ,我们梳理了支撑评估结论的 六项关键事实 。这些事实基于多类证据,包括:METR 对所共享的公司内部模型和公开模型的评估 2 、参与评估的公司所提供的信息 3 、近期开展的一次 嵌入式红队测试 的发现 4 、AI 模型的系统卡,以及其他公开资料。我们从三个角度组织这些事实:“手段” (means),即智能体能否实施有害行动;“动机” (motive),即智能体是否可能尝试这些行动;“机会” (opportunity),即在现有防护下,这些尝试能否成功。 最后 ,我们评估了这些公司在 2026 年 2 月至 3 月这段时间里内部使用的 AI 智能体,检测其是否已经具备发起“ 失控部署 ”所需的手段、动机和机会。这里的失控部署是指一组智能体在不同强度的安全防护和检测情况下,未经授权地持续自主运行且不被人类所察觉。总体来看,我们认为评估时的内部 AI 智能体可能已经具备发起小规模失控部署的手段、动机和机会,但它们目前还缺少让这类部署高度稳健的手段。 All sources All models Incidents With risk regions Public materials METR evaluations Shared by companies ? Hypothetical Download incidents chart (PNG) Download incidents chart with risk regions (PNG) Download incidents.json × 在单独的页面上查看事件图表 鉴于模型能力提升很快,我们预计未来几个月内,失控部署可能会变得更难发现和关闭。我们暂定于 2026 年下半年开展一次类似的试点研究。 感谢 Anthropic、Google、Meta 和 OpenAI 参与本次研究。与以往的外部评估合作相比,这一流程让 METR 能更直接地接触到了内部信息,又保持了高度的编辑独立性 5 。我们由衷感谢各公司的工作人员:他们投入了大量精力,与我们共同打磨这项既无现成模板、又涉及诸多变动因素的试点流程。我们认为针对开发者内部使用 AI 所带来的风险的第三方定期评估应推广为全行业的通行做法。 Pilot process To date, third-party evaluations of frontier AI have largely focused on evaluating individu…
Overview of adaptive parallel reasoning. What if a reasoning model could decide for itself when to decompose and parallelize independent subtasks, how many concurrent threads to spawn, and how to coordinate them based on the problem at hand? We provide a detailed analysis of recent progress in the field of parallel reasoning, especially Adaptive Parallel Reasoning. Disclosure: this post is part landscape survey, part perspective on adaptive parallel reasoning. One of the authors (Tony Lian) co-led ThreadWeaver ( Lian et al., 2025 ), one of the methods discussed below. The authors aim to present each approach on its own terms. Motivation Recent progress in LLM reasoning capabilities has been largely driven by inference-time scaling, in addition to data and parameter scaling ( OpenAI et al., 2024 ; DeepSeek-AI et al., 2025 ). Models that explicitly output reasoning tokens (through intermediate steps, backtracking, and exploration) now dominate math, coding, and agentic benchmarks. These behaviors allow models to explore alternative hypotheses, correct earlier mistakes, and synthesize conclusions rather than committing to a single solution ( Wen et al., 2025 ). The problem is that sequential reasoning scales linearly with the amount of exploration. Scaling sequential reasoning tokens comes at a cost, as models risk exceeding effective context limits ( Hsieh et al., 2024 ). The accumulation of intermediate exploration paths makes it challenging for the model to disambiguate amon…
First week of Musk v. Altman, OpenAI ends Microsoft legal peril over its $50B Amazon deal, DeepSeek previews new AI model that ‘closes the gap’ with frontier models, and more!
OpenAI President Greg Brockman, Perplexity CEO Aravind Srinivas, Box CEO Aaron Levie, and more join us for a day of newsmaking conversations, live at San Francisco's Commonwealth Club.
Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it. **SPONSOR** Prolific - Quality data. From real people. For faster breakthroughs. https://www.prolific.com/?utm_source=mlst Interview: https://youtu.be/cnxZZTl1tkk --- Beth Barnes and David Rein from METR on the one graph that ate the AI timelines discourse, and why the people who built it are the most careful about how it gets read. Beth founded METR after leaving OpenAI alignment. David is first author on GPQA and co-author on HCAST and the METR Time Horizons paper. Together they built the measurement Daniel Kokotajlo called the single most important piece of evidence on AI timelines: the log-linear line of "how long a task a frontier model can complete at 50% reliability" vs release date. The conversation opens on reward hacking. Current models can articulate in chat why a behaviour is undesired and then execute it anyway as agents. From there: construct validity, Melanie Mitchell's four-problem taxonomy, and the ARC-AGI 1-to-2 collapse as a worked example of adversarially-selected benchmarks regressing once labs target them. Beth's counter: METR deliberately does not adversarially select. David's: models do not have to do the right thing for the right reasons. Methodology, then specification — David's compiler analogy, Beth on four-month tasks as expensive to evaluate rather than unspecifiable. Then the SWE-bench…
OpenAI took $10B from a 19-firm Wall Street consortium. Anthropic is closing $1.5B from Blackstone, Goldman, and Hellman & Friedman. Same rooms, different portfolios. The week AI's go-to-market stopped being SaaS and started being private equity.
You’re probably going to hear a lot about the personality conflict. Here’s what’s at stake in the case and what the outcome might be.
OpenAI's latest foundational model sets the company up for a series of models optimized for computer use. The company's co-founder and president explains the strategy.
Modal is an official sandbox provider for the OpenAI Agents SDK.
But Dr Heidy Khlaaf, chief AI scientist at the AI Now Institute and a former OpenAI safety engineer, is sceptical. She notes Anthropic provides no comparison with existing automated security tools, nor any false-positive rates. “It also serves their ‘safety first’ image, as they’re able to justify the lack of public release, even a limited one for independent evaluation, as a public service – when it simply obscures experts’ abilities to independently validate their The post ‘Safety first’ puts Anthropic ahead in game of AI spin appeared first on AI Now Institute .
OpenAI ships GPT-5.4 mini and nano, faster and more capable but up to 4x pricier, DLSS 5 looks like a real-time generative AI filter for video games | The Verge, and more!
DLSS 5 looks like a real-time generative AI filter for video games, OpenAI Reportedly Pivoting to a Focus on Business and Productivity Only, and more!
OpenAI launches GPT-5.4 with Pro and Thinking versions, Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro, Where things stand with the Department of War Anthropic
Anthropic officially told by DOD that it’s a supply chain risk, ‘cancel ChatGPT’ trend is growing after OpenAI signs a deal with the US military, and more!
Evidence-based AI policy is important but hard. We need more in-depth studies – which often don’t fit into commercial release cycles. NOTE: This post reflects my personal meta takeaways about the role of Randomized Controlled Trials (RCTs) in AI safety testing. If you have not yet read the Active Site RCT study itself, consider doing so first: see the main results and forecasts . In early 2025, AI systems began outperforming biology experts on biology benchmarks – OpenAI’s o3 outperformed 94% of virology experts on troubleshooting questions in their own specialties. However, it remained unclear how much this translated to real-world novice “uplift” : Could a novice actually use AI to perform wet-lab tasks they could not otherwise perform? Over the summer, I tested this question directly with Active Site (formerly called Panoplia Laboratories). We recruited 153 novices and randomly divided them into an LLM group and an Internet-only group. Over 8 weeks, participants performed fundamental wet-lab tasks involved in molecular biology workflows like reconstructing a virus from a genetic sequence. We found that, while AI showed signs of helpfulness at individual steps, it did not produce a significant effect on end-to-end success across the three core tasks together – a result that surprised many experts . The result provided a mid-2025 snapshot of how well AIs assist novices at molecular biology. I think there are at least two reasons why this result is very informative: It surpr…
OpenAI to test ads in ChatGPT as it burns through billions, The Drama at Thinking Machines, STEM: Scaling Transformers with Embedding Modules
OpenAI to test ads in ChatGPT as it burns through billions, Sequoia to invest in Anthropic, Zhipu AI breaks US chip reliance, The Drama at Thinking Machines Is Riveting Silicon Valley
AI models’ relationship with our data is getting more dynamic, contextual and private—and the stakes are high The Claim Earlier this year, Elon Musk claimed that ‘all human data for AI training has been exhausted’. Ilya Sutskever, a co-founder of OpenAI, has said the world has reached ‘peak data’. A recent episode of the BBC’s […] The post No, AI hasn’t run out of data appeared first on OpenMined .
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation robyn.cherinka… Wed, 11/12/2025 - 14:40 Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCH-OPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations that make it advantageous for OPE: (1) prevents over-regularization by subtracting the score of the behavior policy during guidance, and (2) generates long-horizon trajectories by stitching partial trajectories together end-to-end. We provide a theoretical guarantee that under mild assumptions, these modifications result in an exponential reduction in variance versus long-horizon trajectory diffusion. Experiments on the D4RL and OpenAI Gym benchmarks show substantial improveme…
“Folha de S.Paulo filed a lawsuit against OpenAI on Wednesday [Aug. 20], demanding that the owner of the ChatGPT artificial intelligence platform stop collecting and using the newspaper’s content without authorization or payment. The suit accuses OpenAI of unfair competition and copyright infringement, stating that ‘the defendant develops and improves its AI tool [...] based […] The post Folha de S.Paulo files lawsuit against OpenAI for unfair competition and copyright infringement appeared first on LatAm Journalism Review by the Knight Center .
The Future of Life Institute’s 2025 summer update to its AI Safety Index shows some companies making incremental progress, but dangerous gaps remain in key categories such as risk assessment and controlling the systems they plan to build.
The recent disruption caused by DeepSeek’s R1 model sent shockwaves through the AI community, demonstrating that Chinese AI advancements may have been underestimated. The model’s performance, rivaling some of the most advanced offerings from OpenAI and Anthropic at a fraction of the cost, signaled a new era of competition in artificial intelligence. However, DeepSeek is […] The post Beyond DeepSeek: An Overview of Chinese AI Tigers and Their Cutting-Edge Innovations appeared first on TOPBOTS .
Preliminary benchmark results for the new OpenAI o1 models.
After studying how companies deploy generative AI applications, I noticed many similarities in their platforms. This post outlines the common components of a generative AI platform, what they do, and how they are implemented. I try my best to keep the architecture general, but certain applications might deviate. This is what the overall architecture looks like. This is a pretty complex system. This post will start from the simplest architecture and progressively add more components. In its simplest form, your application receives a query and sends it to the model. The model generates a response, which is returned to the user. There are no guardrails, no augmented context, and no optimization. The Model API box refers to both third-party APIs (e.g., OpenAI, Google, Anthropic) and self-hosted APIs. From this, you can add more components as needs arise. The order discussed in this post is common, though you don’t need to follow the exact same order. A component can be skipped if your system works well without it. Evaluation is necessary at every step of the development process. Enhance context input into a model by giving the model access to external data sources and tools for information gathering. Put in guardrails to protect your system and your users. Add model router and gateway to support complex pipelines and add more security. Optimize for latency and costs with cache. Add complex logic and write actions to maximize your system’s capabilities. Observability, which allow…
The use of large language models in the real world has strongly accelerated by the launch of ChatGPT. We (including my team at OpenAI, shoutout to them) have invested a lot of effort to build default safe behavior into the model during the alignment process (e.g. via RLHF ). However, adversarial attacks or jailbreak prompts could potentially trigger the model to output something undesired. A large body of ground work on adversarial attacks is on images, and differently it operates in the continuous, high-dimensional space. Attacks for discrete data like text have been considered to be a lot more challenging, due to lack of direct gradient signals. My past post on Controllable Text Generation is quite relevant to this topic, as attacking LLMs is essentially to control the model to output a certain type of (unsafe) content.
For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read, talk, and see. We listen to music to relax and watch out for strange noises to detect danger. Being able to work with multimodal data is essential for us or any AI to operate in the real world. OpenAI noted in their GPT-4V system card that “ incorporating additional modalities (such as image inputs) into LLMs is viewed by some as a key frontier in AI research and development .” Incorporating additional modalities to LLMs (Large Language Models) creates LMMs (Large Multimodal Models). Not all multimodal systems are LMMs. For example, text-to-image models like Midjourney, Stable Diffusion, and Dall-E are multimodal but don’t have a language model component. Multimodal can mean one or more of the following: Input and output are of different modalities (e.g. text-to-image, image-to-text) Inputs are multimodal (e.g. a system that can process both text and images) Outputs are multimodal (e.g. a system that can generate both text and images) This post covers multimodal systems in general, including LMMs. It consists of 3 parts. Part 1 covers the context for multimodality, including why multimodal, different data modalities, and types of multimodal tasks. Part 2 discusses the fundamentals of a multimodal system, using the…
I have been studying policy gradients recently but found different expositions from different sources, which greatly confused me. From the book "Reinforcement Learning: an Introduction (Sutton & Barto Chapter 13)", we get the following policy gradient: $$ \nabla J(\theta) = \mathbb E_\pi\left[G_t\nabla\log\pi(A_t | S_t, \theta)\right]. $$ As we can observe from the equation, it does not relate to trajectory distributions . However, a more intuitive and widely-used introduction to policy gradient starts from defining the distribution of trajectories: $p(\tau)$ . For example, in OpenAI Spinning Up , the policy gradient has the form similar to the following equation: $$ \nabla J(\theta) = \mathbb E_{\tau \sim \pi}\left[\sum_{t=0}^{T}G_t\nabla_\theta\log\pi_\theta(a_t | s_t)\right]. $$ The confusion comes from the fact that the first policy gradient does not have a summation over timestamps and is not sampling from trajectories, but the second samples from trajectories and has a summation. I did find some relevant questions about this confusion, but none of them seemed to have a good answer. Also, I could not identify any source that explained the difference/connection between the two forms. My question is why are there two different ways to describe the policy gradient and are the two forms mathematically equivalent? Update I found a great RL theory book (draft) written by some expert Professors in this field that shows two different formulations: https://rltheorybook.github.io…
I'm asking specifically about ChatGPT4, but the question could apply to either that or 3.5. When you use the ChatGPT API, it's of course up to you to manage conversation history and include that in successive API calls within available context length in whatever manner you choose. In the case of the web interface, they've obviously implemented some system to manage conversation history in context. It clearly doesn't "remember" the entire thing once the conversation gets very long, because it doesn't have infinite context length. So, what strategy does it use to send conversation history to the model once it's exceeded its context length? Does it truncate all content prior to the max context length? Does it summarize earlier parts of conversations to more efficiently fit them within the context? Does it do some dynamic strategy combining many inputs? Or is this just another case where we just don't know, and OpenAI is being tight-lipped about what it's actually doing?
As discussed in this question , the policy gradient algorithms given in Reinforcement Learning: An Introduction use the gradient \begin{align*} \gamma^t \hat A_t \nabla_{\theta} \log \pi(a_t \, | \, s_t, \theta) \end{align*} where $\hat A_t$ is the advantage estimate for step $t$ . For example, $\hat A_t = r_t + \gamma V(s_{t+1}) - V(s_t)$ in the one-step actor-critic algorithm given in section 13.5. In the answers to the linked question, it is claimed that the extra discounting is "correct", which implies that it should be included. If I look in the literature to a seminal paper such as Proximal Policy Optimization Algorithms by OpenAI, they do not include the extra discounting factor, i.e. they use a gradient defined as \begin{align*} \hat A_t \dfrac{\nabla_{\theta}\pi(a_t \, | \, s_t, \theta)}{\pi(a_t \, | \,s_t, \theta_{\rm old})} \end{align*} which does not include the discounting factor (of course, it's dealing with the off-policy case, but I don't see how that would make a difference in terms of the discounting). OpenAI's implementation of PPO also does not include the extra discounting factor. So, how am I supposed to interpret this discrepancy? I agree that the extra discounting factor should be present, from a theoretical standpoint. Then, why is it not in the OpenAI code or paper?
Discussion: Discussion Thread for comments, corrections, or any feedback. Translations: Korean, Russian Summary: The latest batch of language models can be much smaller yet achieve GPT-3 like performance by being able to query a database or search the web for information. A key indication is that building larger and larger models is not the only way to improve performance. Video The last few years saw the rise of Large Language Models (LLMs) – machine learning models that rapidly improve how machines process and generate language. Some of the highlights since 2017 include: The original Transformer breaks previous performance records for machine translation. BERT popularizes the pre-training then finetuning process, as well as Transformer-based contextualized word embeddings. It then rapidly starts to power Google Search and Bing Search. GPT-2 demonstrates the machine’s ability to write as well as humans do. First T5, then T0 push the boundaries of transfer learning (training a model on one task, and then having it do well on other adjacent tasks) and posing a lot of different tasks as text-to-text tasks. GPT-3 showed that massive scaling of generative models can lead to shocking emergent applications (the industry continues to train larger models like Gopher, MT-NLG…etc). For a while, it seemed like scaling larger and larger models is the main way to improve performance. Recent developments in the field, like DeepMind’s RETRO Transformer and OpenAI’s WebGPT, reverse this tre…
[Updated on 2022-03-13: add expert choice routing .] [Updated on 2022-06-10]: Greg and I wrote a shorted and upgraded version of this post, published on OpenAI Blog: “Techniques for Training Large Neural Networks”
--> This is a long overdue blog post on Reinforcement Learning (RL). RL is hot! You may have noticed that computers can now automatically learn to play ATARI games (from raw game pixels!), they are beating world champions at Go , simulated quadrupeds are learning to run and leap , and robots are learning how to perform complex manipulation tasks that defy explicit programming. It turns out that all of these advances fall under the umbrella of RL research. I also became interested in RL myself over the last ~year: I worked through Richard Sutton’s book , read through David Silver’s course , watched John Schulmann’s lectures , wrote an RL library in Javascript , over the summer interned at DeepMind working in the DeepRL group, and most recently pitched in a little with the design/development of OpenAI Gym , a new RL benchmarking toolkit. So I’ve certainly been on this funwagon for at least a year but until now I haven’t gotten around to writing up a short post on why RL is a big deal, what it’s about, how it all developed and where it might be going. Examples of RL in the wild. From left to right : Deep Q Learning network playing ATARI, AlphaGo, Berkeley robot stacking Legos, physically-simulated quadruped leaping over terrain. It’s interesting to reflect on the nature of recent progress in RL. I broadly like to think about four separate factors that hold back AI: Compute (the obvious one: Moore’s Law, GPUs, ASICs), Data (in a nice form, not just out there somewhere on the int…