Evaluating Students’ Understanding Through an AI Agent that Can Ask Questions First

Logical reasoning and dialogues ‘Socratic AI’

Evaluating Students’ Understanding
Through an AI Agent that Can Ask Questions First

Park Chuljin, Bae Mee-jung | No.429 (November 2025 Issue 2)

Article at a Glance

The widespread adoption of GenAI, spearheaded by ChatGPT, is changing university education. For instance, traditional assignments like writing a business case report or multiple-choice quizzes are now at a much greater risk of reflecting only AI usage skills rather than students' actual learning. To see how this can be overcome, I co-developed and trialled a "Socratic AI” agent that can ask questions first and guide students through a dialogue in a graduate-level course. A Socratic AI agent can help clarify what the student really understands about the course content by continuing to ask questions based on the student’s previous responses, and can also be utilized to promote critical thinking in the process. The development process required considerable effort and time to explicitly articulate the instructor's tacit knowledge in a form that can be understood and implemented by the AI agent. However, this process allowed both instructors and students to reflect on one of the fundamental goals of university education: fostering critical and integrative thinking.

Since ChatGPT’s emergence on November 30, 2022, university education has been faced with an upheaval. This is evident for the courses I teach in Business School as well, such as International Business or Strategic Management. One of the core assignments in these courses was “business case analysis”. Generally, these assignments require students to analyze success and failure factors in real business case studies based on knowledge learned in class and present their conclusions in a written report form. Provided that students write the reports, instructors could expect to measure students’ academic achievement by evaluating both their understanding of course content and their ability to implement it in the analysis to build an argument, then logically organizing and communicating it through writing.

However, the landscape changed with the rapid popularization of generative AI (GenAI) services. Now students can obtain a plausible-looking report in mere minutes, or even seconds, with a single prompt like the following:

"Hi! I need to submit a case analysis report for my strategic management class by tomorrow. I’ve attached the business case I need to analyze below. Could you write a 10-page report analyzing the company’s current strategy with a future strategy proposal?"

Using AI to produce reports this way is analogous to traditional types of academic misconduct, such as having someone else write papers or copying others’ work. When students “outsource” these reports to AI, the results do not reflect students’ learning achievement, depth of thinking, or writing ability. But for university educators, the problem goes beyond “cheating has become easier.” The proliferation of GenAI services seems to signify that existing assignments and teaching methods may no longer be sufficient to achieve their intended learning outcomes.

But if the use of GenAI is analogous to traditional academic integrity violations, couldn’t we just prohibit its use like other forms of cheating? Here’s the dilemma: GenAI services are already ubiquitous, in a way that traditional cheating methods weren’t, and offer functionalities far beyond ghost-writing or answering quiz questions. This makes prohibition highly impractical, and there’s insufficient justification to ban AI use entirely just to prevent ghost-writing. Moreover, with companies moving quickly to actively adopt AI, banning its use entirely in university classes—especially in business schools—might be doing the students a disservice by disconnecting them from reality. But we cannot simply ignore this problem either. If the reliance on the AI to do assignments goes unguided and unchecked, it may seriously undermine the students’ chance to learn independent and critical thinking by taking away the necessary trial-and-error process. This runs the risk of rendering much of the time spent in university classes meaningless, if no countermeasures are taken.

This calls for university education, and educators, to change in step with the popularization of GenAI. One of the greatest values of new technologies like GenAI lies in “making the previously impossible possible.” New technologies often allow us to approach existing work in a new way and change how we do things. This formed the basis for my experiment to replace traditional multiple choice quiz with a Socratic AI agent co-developed with DeepSkill, a Korean-Australian startup who specializes in such AI applications.

Why Socratic AI?

One advantage of GenAI is its ability to provide personalized one-on-one services at scale. Chatbots commonly used in classes and businesses today are prime examples trying to leverage this advantage. Particularly, utilizing GenAI can help to provide more diversified and sophisticated services through chatbots as well as more “human-like” conversations.

However, such chatbots are inherently limited for use in university courses, especially for graded assignments. The biggest limitation is that they cannot ask students questions first. Or rather, while they can easily be made to start with a question, their primary function is still providing answers to students’ questions. Thus, they are very much limited in how they can follow up with additional questions based on student responses, to simulate an instructor having an in-class discussion with the student which is often focused on finding out what the student knows and to convey a point. While custom GPTs with detailed instructions may achieve something similar, it’s not easy to reliably create sophisticated dialogue while responding to various situations.

This is where Socratic AI comes in. Socratic AI refers to AI services based on the Socratic method of dialogue that helps students find answers themselves through questioning rather than simply feeding them knowledge. Popular GenAI services like ChatGPT, Claude, and Gemini passively respond to user requests and focus on answering accurately when they do. In contrast, Socratic AI actively begins dialogue by asking questions first and continues with follow-up questions based on user responses to the questions being asked. In business strategy courses, for instance, instead of the user initiating questions, the AI agent might start by asking, “What do you think is the meaning of business strategy as discussed in this class?” While this might seem like a minor difference, it can have strong implications in educational settings. By having the agent ask questions first, it can control the direction of the conversation. And all subsequent Q&A is based on how the student answers the first question, allowing the focus to remain on what is deemed important in the course. In other words, while general-purpose AI services like ChatGPT serve as an “encyclopedia” or an “assistant” answering student questions, Socratic AI acts as a “facilitator” or “discussant” guiding student discussions in course-appropriate directions.

The Socratic approach can provide personalized and directed learning experiences by tracking students’ thoughts and expressions. Given the AI agent is well-trained with background knowledge, it can help confirm how well students understand concepts learned in class and whether they have an integrated understanding connecting different concepts, as it can be made to continue asking questions about the points student has not clarified well enough. This makes it possible for the AI agent to be a better tool than traditional quizzes for measuring how well students have absorbed course content by creating an environment similar to one-on-one discussions between the student and the instructor. Recognizing these strengths, I wanted to verify whether we could replace traditional multiple-choice or short-answer quizzes with a Socratic AI agent. I decided to experiment with replacing multiple-choice quizzes with an AI agent in a graduate-level course focused on technology and innovation management, which ran from June to August 2025. The focus of the agent would be on evaluating how well students understood the weekly course content, including the interconnections between the concepts in and across the weeks.

Development Challenges:
From Tacit to Explicit Knowledge

If the AI agent merely evaluated whether students remembered concept definitions well, it would be no different to existing quizzes. The new form of assessment needed to evaluate whether students had an integrated understanding of course content by connecting multiple concepts. This required the AI agent to flexibly respond to student answers in various contexts and continue meaningful dialogue about course content through asking personalized questions. Simply inputting lecture materials into the agent wasn’t sufficient for this. The AI agent needed to understand not only individual concept definitions but also how these concepts connect and what greater meaning can be derived from those connections. For example, consider the topic of “external analysis” in a business strategy course. The concept itself isn’t very complex: It mainly consists of analyzing external factors that could affect a company, such as economic conditions and competitors. However, for a Socratic AI agent to properly discuss this topic with students, it must be able to explain this concept in connection with other concepts within the overall course context. The AI needs to understand the overall flow and how students’ answers fit into this flow to ask questions appropriate to the level of student’s answers.

Development and testing began in February 2025, about four months before the course. Initially, I thought enhancing the AI agent’s understanding wouldn’t be too difficult. However, as development progressed, I realized there was a sizable gap between what I knew in my head and schematizing it so AI could utilize it. The process of logically designing what questions to ask students and how to continue with follow-up questions based on their answers was entirely different from simply conveying what I knew. It was a process of translating my tacit knowledge into explicit knowledge—into “information” that AI could understand. In other words, for AI to simulate human work in the way I wanted it to, it was essential to codify and transfer my knowledge and know-how for doing that work. And this task becomes more challenging for cases requiring specialized knowledge.

Socratic AI's Strengths:
Leading the Dialogue and Learner-Centered Design

Socratic AI’s strength lies in its ability to maintain control over the focus and direction of the dialogue. For example, while discussing topic A, a student might drift to topics B or C, which are unrelated to A. In such cases, general AI services are likely to say “I don’t understand” and come to a halt, or give completely wrong answers trying to match what is being said. In contrast, a well-designed Socratic AI agent can bring students back to the core topic even when students are going off on a tangent. For instance, it might respond: “That’s an interesting and fresh perspective, but it seems a little distant from the core points of our current discussion. We are discussing A—could you think about it again and share your thoughts?” This is made possible through Socratic AI agent being trained with the relevant background knowledge as well as the logical structure within the background knowledge, fed into the agent in the form of “knowledge graph”. This knowledge graph visualizes how relevant concepts are integrated and connected, and what the most important concepts are.

For example, [Figure 2] shows how the logical structure for defining strategy is built into a knowledge graph. The uppermost box progresses from viewing strategy as based on firm’s purpose → means to achieve it → strategy as a testable “theory” for achieving the end goal → intended strategy turning into emergent strategy through iterations in practice → critical thinking as a necessary component. Such schematized knowledge graph becomes the foundation for the AI agent to understand the relationships between concepts and the logical structure for discussion.

Another strength of Socratic AI is that it relies much less on user proficiency. That is, the user does not have to be well-trained in prompting to fully utilize it. This is important as students will likely be lacking in their ability to ask good questions about the content they have just started learning. However, the quality of answers from a general-purpose GenAI depends greatly on the quality of the user prompts. For instance, there will be a significant difference between the answers when asking “I need to write a report on why business is important. Write me a 7-page draft!” versus “This report will be used for discussion in a university social science liberal arts class, with the main audience being students... The core topic is examining why business is important in modern society. I think the essential content should include [○○○/△△△/×××]. For reference, please consider [○○○○○○] when drafting.” To enhance the quality of output from general GenAI services, users must be able to clearly recognize and articulate what they need. The situation of “garbage in, garbage out” would likely persist for student interactions when using general GenAI services.

In contrast, Socratic AI takes a different approach. Because it begins with a question and is designed to continue with clarifying and follow-up questions, it relies much less on the quality of input from the user. Even if students do not understand the question clearly and lack clarity in their responses, it continues learning progressively through additional questions based on responses provided, allowing students to learn through the discussion. [Figure 1] demonstrates the difference between general dialogue and Socratic AI dialogue. With general GenAI services, answers will likely be vague without clear and specific questions from the users. With Socratic AI, by asking questions first and requesting additional clarifications about parts not clearly explained, students can more easily identify what they can’t explain and what to ask. This dialogue process, seen from the instructors’ perspective, also allows the AI agent (and thus the instructor) to identify what students understand and what they don’t about the content being discussed.

Providing Value to Both Students and Instructors

One of the most encouraging findings from this experiment was student response. Student response for the Socratic AI agent was quite positive in overall despite it being an unfamiliar service that directly affected their grades. Students already accepted that AI services are an unavoidable reality and welcomed attempts to provide better learning experiences through using them. Consequently, there was much positive feedback about “experiencing AI in new ways through class experience.” Another key appeal of Socratic AI to students was that it “felt much more like talking to a person” even compared to existing GenAI services. This was an unintended outcome from the experiment, as making the AI agent more human-like was not the goal. It seems that students feel the approach of AI asking questions first and continuing the discussion based on their responses was more like a real conversation, than the format where students ask questions and AI provides answers.

The fact that students positively accepted this service was partly influenced by the technology and innovation focused nature of the course. Before deploying the AI agent, I asked students to approach it from an innovation perspective and emphasized that all innovation processes could involve unexpected problems and difficulties. While some issues did arise during service implementation, students generally accepted these as a necessary process for innovation rather than system failures. This taught me that establishing a right interpretive framework can be critical when trying new approaches with students.

From the instructor’s perspective, using a Socratic AI agents helped improve feedback quality while enhancing learning efficiency. Traditional multiple-choice quizzes stopped at checking whether the answers were correct or not (and perhaps some explanation on why that is the correct choice). Even for short-answer or essay-style questions, detailed feedback was limited due to time and resource constraints. However, using a Socratic AI agents enabled more detailed evaluation of student responses in a much shorter time based on the logical structure built in through the knowledge graph. It was premature to fully entrust marking to AI due to ethical concerns, student resistance, and accuracy issues. But the agent was able to provide detailed analysis and summary comparing dialogue content with information recorded in the agent, such as knowledge graphs, showing what aligned well and what was missing. Based on the review of dialogue content and the analysis from the AI agent, I could provide students with detailed feedback with finer-grained scores in a shorter amount of time.

[Figure 3] shows a part of information the AI agent provided through its analysis of student responses based on the knowledge graph. Bar thickness indicates how central a concept is in the logical structure, and length shows how many students used it well in explanations. The figure shows aggregated scores for all students, but this was also viewable for each individual student. According to this, students generally understood well the concept of “corporate strategy as testable theory” and its connection to technological development, but relatively few utilized concepts of intended versus emergent strategy. Consequently, the figure shows how well students understood individual concepts and connections between concepts in course content. The detailed and quantified analysis from the agent allowed me to provide more meaningful feedback without missing important aspects where students can improve.

AI Agents to Foster Critical Thinking

Creating high-quality AI agents for educational purposes requires close collaboration between capable developers and instructors, plus considerable effort to schematize knowledge. While this effort started as an attempt to replace quizzes, the real utility came from reflecting on key pedagogical approaches throughout the development process. The process of carefully designing student evaluation tools led me to refine and clarify core course content and reconsider what I ultimately want to convey to students, and what I want students to take away from my course.

Introducing Socratic AI to courses ultimately led to deep reflection on teaching methods, leading me to ask again the fundamental questions of “what to teach” and “how to deliver it.” I do not believe, for instance, that the theoretical frameworks covered in a business strategy course should be its final goal despite their importance as fundamental knowledge. Most frameworks covered in undergraduate-level strategy courses are intended to simplify and formalize external and internal situations for easier understanding and analysis. While they are useful tools for making complex, multi-layered reality more digestible, they’re not applicable in the way mathematical formulas can be. For instance, while innovation strategies can be categorized into incremental, architectural, disruptive, and radical innovations, actual corporate strategies may not always neatly fit into just one of these four types. Rather, understanding the core content and classification criteria of these types aid in clarifying the direction and goals of firm’s strategy, which is helpful for devising next steps. Thus, the ultimate goal of business strategy courses isn’t memorizing theories but cultivating “critical thinking based on knowledge and experience.” Students should be able to apply learned theories to real cases while viewing and examining derived results with a critical eye. They should consider the limitations of the analysis, as well as take broader implications to the society and environment into account.

GenAI, despite its current faults, has shown potential to strongly support such educational goals rather than to hinder them. In traditional classes, students had to spend a large bulk of the time on memorizing theoretical frameworks and doing basis analysis, not leaving much time for critical assessment. GenAI can now handle simple tasks such as initial data collection and draft writing with greater efficiency. This allows instructors to require students to go beyond the basic tasks to critically analyze and examine AI-derived results. This can also allow students to have more time to reflect on their own work and outcomes, enabling them to learn more effectively from the feedback provided. Furthermore, Socratic AI enables simulating educational approaches like “one-on-one meetings”, which was previously impossible or impractical due to time and resource limitations. This means AI agents can be utilized to guide student learning and evaluate thinking processes, opening new paths for enhancing quality of education.

Current Challenges and Future Opportunities

GenAI services have already become an integral part of our everyday lives. University educators are also aware of this situation and trying hard to turn this into an opportunity by seeing where it can help innovate the learning experience while minimizing the negative side-effects. Although it is still unclear whether the AI will replace all human jobs in the near future, it seems clear that the ability to critically utilize AI will be one of the important skills for the students going forward. I believe Socratic AI presents a pathway for using AI services for enhancing student experience and promoting critical thinking.

But many challenges still remain despite the Socratic AI agents showing promise. First, developing agents perfectly tailored to specific courses requires substantial input of time and effort from the instructor, at least initially. Creating sophisticated AI agents that can simulate one-on-one discussions with the instructor requires densely formed knowledge graphs, and building them demands significant time and effort. When discussing Socratic AI agent with colleagues, many expressed keen interest in it but also showed reluctance to invest time required for development. Moreover, instructors may feel uncomfortable about converting their tacit knowledge into explicit form that gets recorded into the AI’s system, which may be utilized outside of their control. This reluctance can become greater with concerns about security issues, which may lead to instructional content and know-how leaking out externally.

Even if such difficulties in developing the agent is overcome, unexpected issues can arise during actual implementation. For instance, seemingly minor settings in how the agent behaves, such as whether to limit student response length, can lead to considerable differences in how it factors into the student learning experience. If responses are limited to be short, students’ ability to convey meaning concisely becomes an implicit evaluation criterion. Without any length limits, the ability to find and utilize more content becomes more important. In addition, being equipped with enough technological resources to deal with unexpected service failures is important as AI agents can malfunction or unexpected problems can arise in the implementation process.

Also, whether to use AI agents as part of marked assignments or as class activities is another important issue. Using them for evaluation would require more focused approach through detailed marking criteria and constraints on scope of student discussions and responses. Ethical and accuracy issues need to be carefully considered when entrusting AI with student evaluation and grading. Using AI agents as part of class activities would require less constraints, enabling more diverse attempts. However, this would require higher agent flexibility, increasing the vastness and complexity of the knowledge graph, demanding more development time and effort.

Despite these practical limitations, I believe my attempt at using a Socratic AI agent was overall successful and positive. It allowed for greater participation from the students and enhanced the relevance of what we study in the course, while maintaining the intended learning outcome.

Looking ahead, it would be fair to be concerned about AI someday completely replacing human instructors through development of more sophisticated knowledge graphs and agents across all disciplines. However, I believe that it still falls on the human instructors to provide directions to AI agents and their development, and evaluating whether AI-powered results are appropriate. This is especially true assuming future scenarios where various specialized Agentic AI are developed and individuals must work like team leaders managing multiple specialized agents performing various tasks. In this scenario, the key will not be about whether AI or humans work better, but how humans can interact and work with AI agents to create greater value. This makes cultivating critical and integrative thinking even more central to university education, and Socratic AI agents have shown great potential for aiding in this purpose.

After completing bachelor’s and master’s degrees in International Business and Strategic Management at Seoul National University, the author received a PhD in Strategy and Organization Theory from Pennsylvania State University. Currently serving as an assistant professor at UNSW Sydney, the author’s core interest revolves around corporate governance, social networks, digital transformation, and sustainability. Recently, the author has been exploring AI’s impact with a keen interest on how it influences businesses and business education.

This content was originally written in Korean in the DBR, and translated into English by the original author with the aid of AI
The DBR has all legal authority over this content. Please note that unauthorized use and distribution may be subject to legal sanctions

기사 원문 보기

이전 목록

DBR의 교육솔루션

Evaluating Students’ Understanding Through an AI Agent that Can Ask Questions First

Evaluating Students’ Understanding
Through an AI Agent that Can Ask Questions First