Introduction
Open AI has reached a remarkable milestone with its state-of-the-art Deep Research capability. The company showed Deep Research, a breakthrough system that lets AI models search the internet, think through multiple steps, and make sense of content over long periods.
This system works differently from regular search tools. It takes 5 to 30 minutes to deliver results because complex tasks need more thinking time. The deep learning AI system creates detailed, fully cited research papers that match expert human work. The model proved its worth by scoring 26.6% accuracy on Humanity’s Last Exam, a tough test that measures AI skills in expert subjects of all types. Deep Research shows how advanced search technologies can plan and execute complex steps to discover, study, and present information. The system works better when it gets more time to process and explore. This makes it perfect to use in market research, academic studies, and personal product research.
- Introduction
- OpenAI unveils Deep Research to transform web search
- Deep Research enables multi-step reasoning across the internet
- OpenAI removes latency limits to allow deeper thinking
- Live demo shows Deep Research analyzing mobile app markets
- Second demo reveals Deep Research’s consumer shopping power
- Deep Research is powered by the O3 reasoning model
- Benchmarks show Deep Research outperforms on expert tasks
- OpenAI plans to expand Deep Research to more user tiers
- Conclusion: Deep Research Marks a Radical Alteration in AI-Powered Knowledge Work
- FAQs
OpenAI unveils Deep Research to transform web search
OpenAI has revealed “Deep Research,” a groundbreaking capability that changes how AI works with web content. This innovative tool goes beyond quick results. It does detailed, multi-step research online and completes tasks in minutes that would take human researchers many hours.
Deep Research works as an independent agent for users. It takes a prompt and searches through hundreds of online sources. The system analyzes and blends information to create detailed, analyst-level reports. The tool provides clear citations and shows its reasoning process, which makes fact-checking easy for users.
A specialized version of OpenAI’s upcoming O3 model powers this technology. The model focuses on web browsing and data analysis. This advanced AI system makes use of its reasoning abilities to understand and analyze text, images, and PDFs from the internet. The system adapts its research methods based on new information it finds during the process.
Regular search tools give instant results. Deep Research takes more time—usually 5 to 30 minutes, based on how complex the question is. This extra time allows for:
- Searching through hundreds of sources on its own
- Deep analysis of the information it finds
- Creation of detailed reports with proper documentation
- Clear citations for all sources used
Deep Research’s performance stands out. OpenAI says the system scored 26.6% on “Humanity’s Last Exam,” a standard that tests expert reasoning in 100 subjects. This score is nowhere near what GPT-4o (3.3%) and Google’s Grok-2 (3.8%) achieved on the same test.
“Deep research is built for people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research,” OpenAI stated. The company adds that the tool helps smart shoppers who want personalized advice on big purchases like cars, appliances, or furniture.
The difference between Deep Research and basic search is clear. Standard search gives quick answers in seconds. Deep Research takes longer to deliver detailed, well-documented, and complete responses. It shows a fundamental change from finding information to creating knowledge.
Pro subscribers can use Deep Research for $200 per month, with a 100-query limit. OpenAI plans to offer access to Plus, Team, and Enterprise tiers soon. The web-only feature should expand to mobile and desktop platforms this month.
This launch comes after similar tools from competitors. Google brought its Deep Research to Gemini Advanced subscribers in December 2024. Perplexity and other AI startups work on similar features. OpenAI emphasizes that Deep Research isn’t just another AI tool—it marks real progress toward their bigger goal of developing artificial general intelligence (AGI).
“Knowing how to blend knowledge helps create new knowledge,” OpenAI noted. This shows how Deep Research moves AI systems closer to doing original scientific research. The launch proves Deep Research isn’t just a better web search. It revolutionizes how AI can improve human knowledge work.
Deep Research enables multi-step reasoning across the internet
Deep Research’s power comes from its ability to reason through complex problems using vast internet resources. The system breaks down complex queries into smaller, manageable sub-tasks that it can handle one after another or all at once as needed.
The model adapts its plan as it finds new information
Deep Research stands out because it adapts dynamically. The system doesn’t stick to a fixed path. It changes its approach based on new information it finds. This adaptive reasoning lets the AI shift its research strategy when it runs into unexpected or conflicting data.
Deep Research uses chain-of-thought (CoT) reasoning to analyze problems step by step, just like humans do during inference. This logical approach breaks complex queries into clear steps and creates a flexible research process. Users can watch the system work through a sidebar that shows:
- Websites being visited
- Information being analyzed
- Reasoning steps are being taken
- Plan adjustments in real-time
This clear view helps users understand how the AI reaches its conclusions. The system creates a detailed research plan when it gets a complex query. It then manages the work by figuring out which tasks need to happen in order and which ones can happen at the same time.
The system’s adaptability proves most valuable with technical topics. To name just one example, when analyzing chemistry concepts like “differences between pure- and mixed-gas sorption for glassy polymers,” Deep Research taps into open-source information. It clarifies key problems, analyzes PDFs, and builds its understanding before it creates a complete report, which could save researchers up to 4 hours of work.
Combines and cites content like a human analyst
Deep Research doesn’t just gather information. It combines content into clear, well-laid-out reports that match what human analysts produce. The system reviews information, spots key themes and conflicts, and organizes reports logically. It even checks its work multiple times to boost clarity and detail.
Deep Research’s output has complete documentation with clear citations and explains its thinking process, which makes fact-checking easy. This citation system works better than traditional AI tools and solves one of the biggest problems with large language models: checking facts and sources.
The system can handle more than just text:
- Analysis of images and visual data
- Interpretation of PDFs and academic papers
- Creation of formatted reports with tables
- Integration of various sources into one clear story
This careful documentation sets Deep Research apart from basic AI assistants. It creates analyst-level reports with proper citations instead of just making responses that sound right. Users can trace every piece of information back to its source.
Deep Research can turn hours of research into quick summaries. The system once created a 16-page document with 22 cited sources in just 6 minutes. While it might take 5-30 minutes to finish its research, longer than normal searches, this extra time leads to more thorough and reliable results.
OpenAI knows the system isn’t perfect. It sometimes can’t tell reliable information from rumors, and doesn’t always show uncertainty clearly. Still, their tests show it makes up fewer facts than current ChatGPT models.
OpenAI removes latency limits to allow deeper thinking
OpenAI has made a bold move away from standard AI optimization strategies. They removed latency constraints for Deep Research and put quality ahead of speed. This choice marks a fundamental change in how we review and deploy AI systems for complex research tasks.
AI applications usually aim to be quick. Typical latency for GPT-3.5-turbo ranges from 500ms to 1500ms, while GPT-4 takes between 1000ms and 3000ms. Deep Research takes a different path. It needs 5 to 30 minutes to complete queries—a timeframe that standard AI interactions would never accept.
Why do longer response times improve quality?
Research shows a clear link between thinking time and output quality. Studies reveal that quick responses might suggest a lack of thought, which can shake users’ confidence in the AI’s conclusions. AI researchers have a 10-year-old rule: more processing time leads to better responses.
Time becomes even more crucial when dealing with complex reasoning tasks. When generative AI models work on sophisticated multi-step reasoning, they can:
- Find more ways to solve problems
- Check information from multiple sources
- Fix mistakes and try different approaches
- Create detailed and nuanced outputs
The benefits are clear in the numbers. High-end reasoning LLMs need over 30 seconds to generate one quality response. Deep Research goes beyond this thinking time, which helps it solve more complex problems.
Scientists have found that users trust AI most when it responds within one to three seconds. Deep Research goes way past this window. This is a big deal as it means that OpenAI believes some complex tasks need much longer processing times.
“If the enterprise application allows for a larger latency and cost budget, [models] can generate longer reasoning traces using the same budget and further improve quality,” researchers note when studying similar approaches. This captures Deep Research’s core idea—trading speed for better results.
How does this align with OpenAI’s AGI roadmap?
The focus on quality over speed fits with OpenAI’s bigger plans for artificial general intelligence (AGI). Their planning documents state, “We believe this is the best way to carefully steward AGI into existence-a gradual transition to a world with AGI is better than a sudden one”.
This careful approach gives “people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place”. Deep Research shows this philosophy by:
- Providing clear documentation of its reasoning
- Letting users watch its work live
- Giving fully cited outputs that humans can verify
- Showing the value of careful, methodical AI reasoning
OpenAI states clearly that “as our systems get closer to AGI, we are becoming increasingly cautious with the creation and deployment of our models”. They know that “our decisions will require much more caution than society usually applies to new technologies, and more caution than many users would like”.
Removing latency constraints means more than just a technical choice. It shows a belief that quality, safety, and transparency matter more than quick responses when working with advanced AI. Through Deep Research, OpenAI proves that real AI progress sometimes needs patience. They let systems think deeply about complex problems, just like human experts do.
Live demo shows Deep Research analyzing mobile app markets
OpenAI showed Deep Research’s capabilities in a high-profile demo. The system broke down the complex relationship between language learning applications and mobile market penetration. This task needed extensive cross-disciplinary expertise and data synthesis.
Query: language learning and mobile penetration trends
The demo started with a direct yet multi-layered question: break down current trends in language learning mobile applications and their market reach across regions. Deep Research’s autonomous workflow kicked in, starting a complete investigation that would normally take hours of human expert analysis.
Attendees watched Deep Research’s sidebar show its thought process, website visits, and information review in real-time. The system first identified key areas to break down, including language learning app usability, AI integration possibilities, and mobile technology adoption rates in different markets.
Deep Research searched through hundreds of sources, gathered relevant information, and refined its analysis during the demo. This process took about 15 minutes to complete. The audience saw the system:
- Find authoritative research on language learning technologies
- Get demographic data about mobile app usage patterns
- Find academic studies on AI implementation in language education
- Compare market reach statistics across global regions
Deep Research showed it could do in-depth, multi-step reasoning across multiple sources, unlike traditional search tools that just provide links. The system changed its search path based on new information, following a research approach similar to human analysts.
Output: formatted report with tables and recommendations
Deep Research produced a complete report with clear sections and proper citations to sources. The report started with an executive summary of key findings, followed by detailed sections about market trends, user priorities, and technological breakthroughs.
A notable section had a detailed table comparing language learning applications based on:
Category | Current Implementation | AI Enhancement Potential | User Engagement Impact |
Usability & Design | Structured approaches with linear progression | Adaptive interfaces responding to learning styles | High – Cited as primary user concern |
Convenience & Motivation | Gamification elements, daily reminders | Personalized habit formation systems | Medium-High – Critical for retention |
Community Building | Basic forums and chat functions | AI-driven conversation partners, cultural context | Medium – Important for advanced learners |
The report definitely addressed how language learning applications offer benefits like “establishing autonomy in the acquisition of language skills, enabling learners to pick up new vocabulary and learn correct pronunciations at their convenience, and increased motivation”. It also acknowledged challenges like “the lack of human interaction, failure to account for the relationship between language and context, and unexplained language rules”.
The report included actionable recommendations to apply AI in language learning applications:
- Creating intake surveys for personalized learner profiles based on specific goals (conversational, business, or age-appropriate communication)
- Adding immediate assessment capabilities to create non-linear learning pathways
- Finding learner strengths and weaknesses with real-time comprehension checks
- Building culturally attuned AI systems that recognize interconnected cultural aspects of languages
Deep Research’s analysis showed that AI integration in language learning apps has “numerous applications including algorithms that personalize learning for individual learners, identify and structure lessons based on strengths, evaluate progress, and simulate interactive environments”. These breakthroughs “enhance both language comprehension and production skills, optimizing outcomes of language teaching and learning”.
The demo showed how Deep Research works as more than just a search tool—it’s a complete analytical system that produces expert-level insights quickly. One observer noted that the system worked like “a personal research assistant” capable of “automatically browsing up to hundreds of websites, thinking through findings, and creating insightful multi-page reports”.
Second demo reveals Deep Research’s consumer shopping power
OpenAI’s second showcase gave a powerful example of how Deep Research helps with everyday buying decisions. The demo showed how the system turns hours of product research into quick, tailored recommendations.
Query: Best skis for Japan trip
The next demo tackled what seemed like a simple question: “What are the best skis for a trip to Japan?” This basic query needs careful analysis of many factors. These include Japan’s special powder conditions, how well you ski, different types of skis, brand options, and what you want to spend.
Deep Research jumped right into specialized skiing forums, brand websites, expert reviews, and Japan-focused travel guides. The system looked at:
- Japan’s special snow features (especially the famous “Japow” powder)
- Ski types that work best in these conditions
- What experts suggest for different skill levels
- Prices from various sellers
- Technical details that matter for powder skiing
Viewers watched as the system moved between sources and built a detailed picture of what Japanese skiing needs. Deep Research showed its smarts by focusing on powder performance over other features because it understood how unique Japanese snow is.
Output: product comparison with color and performance filters
The system took about 12 minutes to create a detailed ski recommendation report that’s easy to use. Unlike regular search results that just give you links, this report laid out useful comparisons and insights.
The report centered on a detailed table comparing five ski models perfect for Japanese conditions. Each option showed:
Ski Model | Optimal Conditions | Width | Turn Radius | Price Range | Special Features |
Model A | Deep powder | 110mm | 19m | $699-749 | Rockered tip, camber underfoot |
Model B | Powder/variable | 102mm | 16m | $599-649 | All-mountain versatility |
Model C | Powder/trees | 106mm | 17m | $749-799 | Lightweight construction |
Model D | Powder/groomed | 98mm | 15m | $549-599 | Budget-friendly option |
Model E | Deep powder/expert | 118mm | 22m | $899-949 | Professional-grade construction |
The report used color-coded sections to highlight key performance features. Skis got visual ratings for how well they handled powder, tree runs, high speeds, and different snow conditions.
This went way beyond basic shopping help. Deep Research tailored its suggestions based on real understanding. The system explained how different ski widths work in Japanese powder compared to North American or European snow. The report even covered practical travel tips about airline ski policies and rental options at Japanese resorts.
The demo proved how Deep Research transforms complex buying decisions by analyzing product information better than any search engine could. Someone watching noted that this bridges the gap between simple product searches and hiring your shopping expert or consultant.
The system needed about 12 minutes to complete this ski recommendation. The quality and usefulness of the results showed why waiting those extra minutes makes sense when you’re making big purchase decisions.
Deep Research is powered by the O3 reasoning model
OpenAI’s sophisticated O3 reasoning model forms the technical foundation of Deep Research. This state-of-the-art AI system has been fine-tuned specifically to conduct thorough, multi-step analysis. The specialized system makes remarkable progress in how artificial intelligence processes complex information and guides through various knowledge sources.
Trained with reinforcement learning on browsing tasks
The O3 model that powers Deep Research went through extensive end-to-end reinforcement learning. The training focused on challenging browsing and reasoning tasks in many domains. The system learned to plan and execute multi-step searches. It even learned to backtrack when it found unexpected information. This approach allows the model to:
- Generate chains of thought before responding
- Self-check reasoning pathways for accuracy
- Plan sequences of actions based on query requirements
- Adapt strategies as new information emerges
The O3 model is different from previous models like GPT-4. It doesn’t just generate single-pass answers from internal knowledge. The model launches complete multi-step analyses that adjust dynamically as circumstances change. This capability comes directly from its reinforcement learning foundation. The model learned not just how to use tools but when to deploy them based on desired outcomes.
The deep learning algorithms let the system handle complex tasks through simulated reasoning. It models thought processes internally before responding. This training approach creates a versatile system that manages open-ended situations effectively, from analyzing market trends to evaluating consumer products.
Capable of using Python, reading PDFs, and embedding images
O3 stands out as OpenAI’s first reasoning model that can use and combine every tool within ChatGPT’s ecosystem. Its technical capabilities extend way beyond simple text processing:
- Executes Python code for calculations and data analysis
- Generates and iterates on graphs and visualizations
- Reads and interprets PDF documents
- Processes and understands images
- Embeds both website images and generated graphics in responses
The model doesn’t just see images—it “thinks with” them by integrating visual data directly into its reasoning process. This breakthrough helps O3 tackle problems that blend visual and textual reasoning. It works even with blurry, reversed, or low-quality images.
Users can now upload documents for analysis. Deep Research extracts meaningful information from charts and diagrams and incorporates relevant visuals into its complete reports. The system manipulates images during analysis by rotating, zooming, or transforming them. These capabilities substantially expand its problem-solving abilities beyond traditional text-only AI systems.
Benchmarks show Deep Research outperforms on expert tasks

Latest measurements show that OpenAI’s deep learning AI system performs better than other models by a lot when handling complex tasks that need expert knowledge. The results prove how well Deep Research works compared to human researchers and other AI systems.
Pass rates by economic value and task complexity
Deep Research achieved impressive results at all difficulty levels on the GAIA (General AI Assistant Benchmark). The system scored 78.66% on Level 1 tasks, 73.21% on Level 2 tasks, and 58.03% on Level 3 tasks. This adds up to 72.57% average accuracy. This is a big deal as it means that previous top scores only reached 63.64% across all levels.
The most convincing proof of Deep Research’s abilities comes from the BrowseComp test, where it got 51.5% accuracy on questions that needed extensive online research. This stands out because human testers gave up on 70% of these questions after trying for two hours. The humans who did attempt the questions only answered about 30%, and got 14% of their answers wrong.
The research team found that there was a direct connection between Deep Research’s performance and computing power. The accuracy goes above 75% when running more parallel tasks and checking multiple answers.
Performance on Humanity’s Last Exam and internal tests
Deep Research has shown amazing results on Humanity’s Last Exam (HLE), a tough test with 2,700 challenging questions from over 100 expert subjects. The system reached 26.6% accuracy, beating the next-best model’s score by 183% in just two weeks.
Here’s how this score compares to other leading AI models:
Model | HLE Accuracy (%) |
OpenAI Deep Research | 26.6 |
Perplexity Deep Research | 21.1 |
OpenAI o3-mini (high) | 13.0 |
DeepSeek-R1 | 9.4 |
OpenAI o1 | 9.1 |
Gemini Thinking | 6.2 |
Claude 3.5 Sonnet | 4.3 |
Grok-2 | 3.8 |
GPT-4o | 3.3 |
It’s worth mentioning that experts consider 50% accuracy as the key milestone where AI matches human expert capability in this field. The research suggests we could reach this milestone within 12 months at current improvement rates.
OpenAI plans to expand Deep Research to more user tiers
OpenAI continues to expand access to its deep AI technology through a phased rollout plan. The company has steadily expanded Deep Research availability beyond its original launch group since May 2025.
Currently in Pro, coming to Plus, Team, and Enterprise
Deep Research made its debut exclusively for Pro tier subscribers at $200 per month before expanding to other subscription levels. OpenAI started rolling out the feature to Pro users in the United Kingdom, Switzerland, and the European Economic Area on February 5, 2025. Plus, users got access to this capability on February 25.
OpenAI substantially increased query allocations for users of all tiers in April 2025:
User Tier | Previous Monthly Limit | Current Monthly Limit |
Pro | 120 | 250 |
Plus/Team/Enterprise/Edu | 10 | 25 |
Free | 0 | 5 |
This broader distribution became possible through a “new lightweight version of deep research powered by a version of o4-mini, designed to be more cost-efficient while preserving high quality”. Users who reach their monthly limit with the standard model automatically switch to this lightweight version until the monthly reset.
Users can check their remaining tasks by hovering over the ‘Deep research’ button. The limits reset every 30 days from first use.
Future integration with custom data and enterprise tools
OpenAI continues to develop more sophisticated integration capabilities. The GitHub connector for Deep Research works globally for Plus, Pro, and Team users. Enterprise access will be announced “at a later date”.
The biggest problem needs attention: “Currently, deep research can access the open web and any uploaded files, but is not able to access private data sources (e.g., subscription-based sources, internal resources)”. This limitation will fade as OpenAI develops enterprise integrations.
The system will grow to include “embedded images, data visualizations, and other analytic outputs”. Enterprise customers will get deeper customization options and advanced research capabilities.
The integration roadmap balances accessibility with premium features. This helps maintain clear value differences between the $20/month Plus tier with 25 queries and the $200/month Pro offering with 250 queries.
Conclusion: Deep Research Marks a Radical Alteration in AI-Powered Knowledge Work
OpenAI’s Deep Research represents a fundamental move in how artificial intelligence tackles complex information tasks. The technology breaks free from quick response constraints and prioritizes complete analysis over speed. Users seeking thorough, well-cited research on complex topics get much better results.
The performance metrics tell a compelling story about this technology’s capabilities. Deep Research scored an impressive 26.6% accuracy on Humanity’s Last Exam. Other models typically score below 10%. The system also analyzes hundreds of sources and provides clear documentation of its reasoning process. This sets a new standard for AI-assisted research.
The system’s patient approach to problem-solving makes it unique. It takes 5 to 30 minutes to complete requests and uses this time to conduct a thorough analysis from multiple sources. This thoughtful processing helps Deep Research create expert-level reports that match human analysts’ work.
Market analysis and consumer product research examples show real value in different scenarios. The applications benefit professional knowledge workers who need complete analysis and consumers making big purchasing decisions. These use cases span industries and personal needs of all types.
All the same, potential users should know about access limits. OpenAI first restricted it to Pro subscribers at $200 monthly. They have now expanded availability across Plus, Team, and Enterprise tiers with different monthly query limits. The company plans to add integration with custom data sources and enterprise tools.
The O3 reasoning model that powers this system sets a new standard for AI capabilities. The model was learned through reinforcement learning on browsing tasks. It now knows how to plan multi-step research processes, adapt to new information, and combine findings into clear reports.
Find the Future of AI Research with OpenAI’s Deep Research through its growing accessibility across multiple subscription tiers.
Deep Research stands as a crucial milestone in OpenAI’s trip toward more capable AI systems. The technology shows how removing artificial time constraints leads to better results for complex tasks. Without a doubt, this lesson will shape future AI development. This patient approach might become the standard for next-generation knowledge work tools as users value quality and reliability more than speed.
FAQs
Q1. What is OpenAI’s Deep Research, and how does it differ from traditional search engines? Deep Research is an AI-powered tool that conducts comprehensive, multi-step research across the internet. Unlike traditional search engines that provide quick results, Deep Research takes 5-30 minutes to analyze hundreds of sources, synthesize information, and produce detailed reports with proper citations.
Q2. How does Deep Research improve the quality of its outputs? Deep Research improves output quality by removing latency constraints and allowing for longer processing times. This enables the AI to explore more potential solutions, verify information across multiple sources, perform self-correction, and generate more comprehensive and nuanced outputs.
Q3. What are some practical applications of Deep Research? Deep Research can be used for various tasks, including market analysis, academic research, and consumer product recommendations. It’s particularly useful for complex queries that require synthesizing information from multiple sources, such as analyzing mobile app markets or finding the best skis for specific conditions.
Q4. How does Deep Research compare to other AI models in terms of performance? Deep Research significantly outperforms other AI models on expert-level tasks. For example, it achieved 26.6% accuracy on Humanity’s Last Exam, a benchmark featuring challenging questions across over 100 expert-level subjects. This is substantially higher than the performance of other leading AI models.
Q5. Who can access Deep Research, and what are the current usage limits? Deep Research is available to OpenAI’s Pro subscribers, with plans to expand to Plus, Team, and Enterprise tiers. As of the latest update, Pro users have a monthly limit of 250 queries, while Plus/Team/Enterprise users have 25 queries. OpenAI is gradually increasing access and query allocations across different user tiers.