How Arkentec-SHULA Keeps AI Tutoring 100% Grounded in Your Course
A deep look at the retrieval, ranking, and verification pipeline that prevents hallucinations and keeps every answer tied to the instructor's own material.
Ask a general-purpose AI tutor a question about your course, and it will almost always give you an answer. Confident, fluent, beautifully formatted. The problem is that the answer was generated from a vast, opaque training corpus that the instructor never approved, never edited, and often never even saw. Sometimes it's right. Sometimes it's subtly wrong in ways that are hard to catch. And sometimes — more often than vendors like to admit — it's pure fabrication.
This is the hallucination problem, and in higher education it isn't a minor annoyance. A tutor that invents a definition, misattributes a theorem, or quietly contradicts the lecturer doesn't just confuse one student. It erodes the trust that makes the entire tool worth using.
Arkentec-SHULA was built around a single, stubborn conviction: an AI tutor for a course should know only what the course knows. Not the open internet. Not a frozen snapshot of Wikipedia from 2023. Not the aggregated opinions of a billion Reddit comments. The instructor's lectures, slides, PDFs, problem sets, recorded videos, and curated readings — and nothing else.
The architecture in one sentence
Every student question is answered by retrieving passages from the course corpus, ranking them for pedagogical relevance, generating a response that cites those passages, and verifying that the response is actually supported by them before it ever reaches the student.
That sentence hides a lot of engineering. Let's open it up.
Step 1 — Ingestion: turning a course into a searchable mind
When an instructor connects a course, Arkentec-SHULA ingests every artifact: lecture slides, PDFs, Word documents, problem sets, lab manuals, recorded lectures (transcribed with timestamps), and any linked readings the instructor explicitly approves. Each artifact is chunked into semantically coherent passages — not arbitrary 500-token windows, but units that respect headings, slide boundaries, and natural paragraph breaks.
Each chunk is embedded into a high-dimensional vector and indexed alongside structured metadata: which week, which lecture, which slide number, which timestamp in the video, which problem in the set. This metadata is what makes citations precise later on.
Step 2 — Retrieval: finding the right passages, not just similar ones
When a student asks "why does the residual in a least-squares fit have to be orthogonal to the column space?", Arkentec-SHULA doesn't just grab the top-k nearest embeddings. It runs a hybrid retrieval: dense vector search for semantic similarity, sparse keyword search for technical terms the embedding model might smooth over, and a metadata filter that prefers material from the current week and the prerequisite weeks before it.
The result is a candidate set of passages that are not only topically relevant but pedagogically appropriate — material the student has actually been taught, in the order the instructor intended.
Step 3 — Ranking: choosing what the model is allowed to see
Retrieval is recall-oriented. Ranking is precision-oriented. Arkentec-SHULA's ranker scores each candidate passage on three axes: semantic fit to the question, alignment with the active pedagogical mode (Socratic, Hint-First, or Direct), and recency in the course timeline. Passages that fail a minimum threshold are dropped entirely.
If nothing in the corpus passes the threshold, Arkentec-SHULA does not fall back to general knowledge. It tells the student, plainly, that this question isn't covered by the course material, and offers to flag it for the instructor. This is the single most important behavior in the system, and the one most other AI tutors get wrong.
Step 4 — Generation: constrained, cited, and auditable
Only the surviving passages are sent to the language model, along with a system prompt that does three things: instructs the model to answer only from the provided passages, requires inline citations for every factual claim, and forbids the model from drawing on outside knowledge even when it could.
Citations are first-class. Every claim links back to the exact slide, the exact PDF page, the exact video timestamp. A student reading the answer can click any citation and land directly on the source. An instructor auditing the tutor's behavior can do the same.
Step 5 — Verification: catching the model when it drifts
Even constrained models occasionally drift. Arkentec-SHULA runs a verification pass on every generated response: a separate, smaller model checks each claim against its cited passage and flags any sentence that isn't actually supported. Flagged sentences are either rewritten or removed before the student ever sees the response.
This pass adds a few hundred milliseconds of latency. It also catches the long tail of subtle hallucinations that constrained generation alone doesn't prevent. We think the trade is obvious.
What this feels like for a student
A student asks a question. The tutor answers in the voice and style their instructor chose. Every claim has a citation they can click. If they push on something the course doesn't cover, the tutor admits it instead of inventing an answer. Over a semester, students learn to trust the tutor — and, more importantly, to verify its sources, which is a skill that outlasts any single course.
What this feels like for an instructor
An instructor uploads their material once. They choose a pedagogical mode. They review an audit log that shows, for any answer the tutor has ever given, exactly which passages it drew from. They can see which topics generate the most confusion, which sections of a lecture get replayed most often, and which questions the tutor refused to answer because the course didn't cover them — a remarkably useful signal for what to add next semester.
Grounded is an architecture, not a guardrail
Plenty of vendors have started slapping the word "grounded" onto products that are really just a general-purpose chatbot with a retrieval step bolted on the side. The model still has access to its full training corpus. The system prompt politely asks it to prefer the retrieved passages. When the retrieved passages are thin, the model fills in from memory, and the hallucinations come right back.
Arkentec-SHULA is built the other way around. The course corpus isn't a hint to the model — it's the only world the model is allowed to operate in. Retrieval, ranking, citation, and verification aren't optional features. They're the system. Remove any one of them and Arkentec-SHULA stops being Arkentec-SHULA.
That's the bet: that an AI tutor worth deploying in a real classroom has to earn the instructor's trust on day one, and keep earning it every day after. Grounded architecture is how we do that.
