With AI platforms — as with any SaaS — the devil is in the details. What’s possible, what’s not, what’s claimed versus what’s actually delivered, and what kind of impact a tool can have on teaching and learning depend almost entirely on how the platform was built, what assumptions it carries, and how customizable it is to your district.
The list below is the one we hand to superintendents, curriculum directors, and technology leaders evaluating AI platforms — and it now incorporates the requirements districts are putting into their own AI tool approval checklists, AI addenda to student data privacy agreements, and state-specific privacy statutes. Use it in your next vendor meeting. If a vendor cannot answer most of these — clearly, specifically, without hand-waving — you have your answer.
Based on our research, evidence-backed approaches, academic papers, and implementations, we built a list of 60+ questions to ask K-12 AI vendors.
If a vendor cannot answer most of these — clearly, specifically, without hand-waving — consider that a yellow, orange, or red flag.
Pedagogical Architecture & Assumptions
-
What pedagogical theories and learning frameworks are encoded into how your AI responds — Bloom's Taxonomy, Webb's DOK, UDL, Vygotsky's ZPD, cognitive load theory, others?
Why it matters: Every AI tool encodes a theory of learning whether the vendor names it or not. If they can't articulate it, it's still operating in your classrooms.
-
How does your platform support the full teaching lifecycle — plan, instruct, assess, analyze, iterate — versus optimizing only one step?
Why it matters: Tools that only support planning, or only support feedback, force you to layer more tools to fill the gaps.
-
How does your AI avoid reinforcing weak instructional practices when teachers use it as a shortcut?
Why it matters: An AI that makes a bad lesson faster is not a benefit. It's a risk multiplier.
-
How does your platform distinguish between recall, application, analysis, and synthesis — and how does it generate work at the appropriate DOK level for the standard being taught?
Why it matters: Without DOK awareness, the AI cannot tell the difference between knowing and understanding, and neither will your assessment data.
-
How does the platform handle productive struggle — when does it provide an answer versus a hint, a scaffold, or a question back?
Why it matters: AI that always answers is AI that bypasses learning. The architecture has to defend struggle, not collapse it.
-
How does the platform identify and remediate student misconceptions, rather than just marking work right or wrong?
Why it matters: Effective instruction targets the why behind a wrong answer. AI that only grades cannot teach.
-
How does the platform support gradual release of responsibility — modeling, guided practice, independent practice — within a single assignment or across a unit?
Why it matters: One-shot generation is not instructional design. Real teaching is sequenced, and your AI should be too.
-
How does the platform align to evidence-based instructional practices — explicit instruction, retrieval practice, spaced practice, formative assessment, feedback?
Why it matters: If the vendor cannot name the practices their tool supports, they are guessing at pedagogy.
-
How does your platform support culturally responsive and identity-affirming pedagogy in content generation, examples, and student-facing prompts?
Why it matters: Generic AI defaults to a single cultural reference frame. That is a quiet equity problem you cannot afford to ignore.
AI Model, Customization & District Context
-
What specific underlying AI models do you use — foundation model name, version, and provider — and have they been fine-tuned for K-12 instruction?
Why it matters: A generic foundation model wrapped in a chat interface is not the same as a model tuned on instructional data, standards, and pedagogy.
-
What is your fine-tuning methodology — supervised, RLHF, instruction-tuning, custom — and what data was used for training and evaluation?
Why it matters: How a model was trained determines what it does well, what it does badly, and what biases it carries. Vague answers here are red flags.
-
How often are models updated, what is your version-control process, and how do districts get notified when model behavior changes?
Why it matters: AI is not static. Districts deserve change management, not surprise behavioral shifts in their classrooms.
-
What district-specific documents can the AI ground in — curriculum maps, scope and sequence, instructional frameworks, AUPs, technology use policies, school improvement plans, IEP guidance, MTSS protocols?
Why it matters: If the AI does not know your district's documents, it is answering from generic web data and calling it instructional support.
-
How are district documents indexed and retrieved — what embedding model, what chunking strategy, what retrieval method, and how is grounding verified per response?
Why it matters: This is where most "RAG-based" AI tools fall apart in practice. Quality of retrieval determines quality of answer.
-
How does the AI cite its sources when it makes a claim or recommendation, and how can administrators audit those citations?
Why it matters: Citations turn the AI from a black box into a defensible answer. Without them, you cannot stand behind anything it produces.
-
What is the platform's measured hallucination rate on district-specific content, and how have you tested for it?
Why it matters: A vendor that cannot tell you their hallucination rate has not measured it. You will measure it for them, in front of parents.
-
Can the platform be customized at the district, school, grade, course, and teacher level — and how do those customizations propagate when a district document changes?
Why it matters: AI without granular customization is one-size-fits-all instruction in a chat box.
-
How does the AI behave when district documents conflict, are silent, or are out of date — does it refuse, default, escalate, or guess?
Why it matters: How a system fails is more informative than how it succeeds. Ask explicitly.
-
What specific AI methods does your platform use — Intelligent Tutoring Systems, adaptive learning, NLP, ML-based recommender systems, chatbots / LLMs, computer vision, speech recognition, translation, time series analysis, reinforcement learning, dimensionality reduction — and for what educational purposes is each used?
Why it matters: District AI data privacy addenda increasingly require vendors to itemize the AI methods used by purpose. A vendor that cannot list this in plain language is not ready to sign your AI schedule of data.
Data Privacy, Security & Governance
-
Where does student and teacher data live (is it stored in U.S.-compliant jurisdictions?), who has access, how long is it retained — and is any of it used to train your models, third-party models, or to develop synthetic or inferred data?
Why it matters: Most district AI addenda now explicitly prohibit student data being used for training, synthetic data generation, or inferred data — even when de-identified or aggregated. A vendor that cannot answer in plain English is not contractable.
-
Does the contract confirm that student data remains the property of the district — and that sub-licensing of student data is prohibited without explicit written permission from the LEA or eligible parents?
Why it matters: Data ownership is the single most contested clause in district AI agreements. The default should be that the data is yours, not the vendor's.
-
What student data does the platform collect — name, DOB, student ID, demographics, academic records, special education info, health information, biometric data, behavioral data, location, input data — and how is data minimization enforced so only necessary PII is collected?
Why it matters: Districts now require AI addenda to enumerate exactly which categories of student data are collected. Vague answers force the district to assume the worst — and to deny approval.
-
What FERPA, COPPA, PPRA, SOC 2, and state-specific (Washington RCW 28A.604 / RCW 42.56.590, California SOPIPA / AB 1584) compliance and certifications do you maintain — and where can we see the documentation?
Why it matters: Federal compliance is the floor. State student privacy statutes are where most vendors quietly fail.
-
Is data encrypted at rest and in transit, and what is the full list of subprocessors that touch student data?
Why it matters: Data flow is the story. A short list of subprocessors is good news; a long, vague list is a procurement risk.
-
What admin controls, audit logs, and guardrail configurations exist — and are they enforced at the model layer, not as a teacher-side reminder?
Why it matters: Policy enforced in a banner that teachers can ignore is not enforcement. The AI should behave correctly by default.
-
Do you have a written incident response and breach notification plan — and what is your notification SLA to districts after a confirmed incident?
Why it matters: Districts are obligated to notify families on a clock. A vendor without a written breach plan extends that clock and puts the district at legal risk.
-
Does the platform make any automated educational, instructional, or employment recommendations to students — and if so, how is that disclosed to teachers, students, and families, and do you certify that no response is determined in whole or part by paid third-party consideration?
Why it matters: Washington's Student User Privacy in Education Rights statute prohibits AI recommendations being influenced by paid third parties. The vendor must certify this in writing.
-
Do you commit, in the contract, to no targeted advertising to students and to no monetization of student data in any form?
Why it matters: This is a baseline contractual requirement in most state student privacy laws. A serious vendor commits to it without hesitation.
-
What is your process for notifying districts of material changes — new AI features, new subprocessors, new data collection, or changes to terms of service — and how much advance notice do districts get before changes take effect?
Why it matters: AI tools evolve silently. Most district AI approval checklists now require re-evaluation when ToS, AI features, or subprocessors change. Vendor advance notice is the only way that works.
-
What happens to district data at the end of the contract — return or destruction within 30 days, backup purging timeline, and what verification do you provide?
Why it matters: Year-one privacy promises are easy. End-of-contract data handling is where most vendors are quiet, and where many state privacy laws now place explicit obligations.
Student Learning, Growth & Cognitive Development
-
How does the platform support student growth and learning, not just task completion?
Why it matters: Productivity-first AI optimizes for finishing the worksheet. Learning-first AI optimizes for what happens in the student's head.
-
What specific safeguards prevent students from outsourcing their thinking to the AI?
Why it matters: Cognitive offloading is the single biggest risk of AI in education. The vendor must have a deliberate answer, not a deflection.
-
Does the platform scaffold student work by Depth of Knowledge (DOK) level, and does it track student progression across DOK levels over time?
Why it matters: Without DOK awareness, the AI cannot tell the difference between recall and analysis — and neither can your data.
-
How does the AI promote metacognition — does it prompt students to explain their reasoning, plan their work, or reflect on their understanding?
Why it matters: Metacognitive prompts are one of the highest-leverage instructional moves AI can make. Their absence is telling.
-
How does the platform identify a student's zone of proximal development and adjust difficulty in real time?
Why it matters: Static AI delivers the same response to every learner. Effective AI adjusts to the student in front of it.
-
How does the platform track standards mastery and longitudinal student growth — across units, semesters, and years?
Why it matters: A snapshot is a vanity metric. Longitudinal data is the basis for instructional decisions.
-
How does the AI distinguish between a student who is stuck and a student who is learning slowly — and what does it do in each case?
Why it matters: Same surface behavior, very different instructional needs. AI that cannot tell them apart will harm both.
-
How does the platform support student self-regulation, goal-setting, and reflection — versus delivering content at them?
Why it matters: Self-regulated learners outperform tutored learners over time. AI that does not build that muscle is borrowing future learning.
-
How does the platform support long-term retention — spaced practice, retrieval practice, interleaving — rather than only performance on the assignment in front of the student?
Why it matters: Students who pass the quiz and forget by Monday have not learned. The platform's architecture should reflect the science.
Research Foundation & Evidence Base
-
Where did your platform come from — who built it, and with what research foundation?
Why it matters: AI tools built first and sold into education second carry a different risk profile than tools built with educators and researchers from day one.
-
What independent, peer-reviewed evidence exists that your platform improves teaching practice or student outcomes?
Why it matters: Marketing claims are not evidence. Districts are entitled to ask for studies, not testimonials.
-
Was your platform built with school districts and education researchers in the room — and is it backed by federal research funding (NSF, IES, DOE)?
Why it matters: Federal funding is one of the few external signals that a tool was built with rigor, not just speed.
-
What research methodologies do you use to validate your platform's instructional claims — and have you published RCTs, quasi-experimental studies, or longitudinal effectiveness data?
Why it matters: A vendor claiming "research-backed" without rigor behind it is using the word as marketing, not as substance.
-
Who are the named education researchers, learning scientists, and former district leaders on your team, and what is their visible role in product design?
Why it matters: A learning-science team with names you can look up is a different signal than a vague reference to an "advisory board."
Preparing AI-Ready Graduates & Academic Integrity
-
How does your platform prepare students for an AI-enabled workforce — not by replacing their thinking, but by building it?
Why it matters: Your graduates will enter a labor market in which AI fluency is a baseline. The platform should be visibly contributing to that fluency.
-
How does your platform make it harder, not easier, for students to cheat — and how do you preserve student authorship and accountability?
Why it matters: An AI tool that turns assignments into generated artifacts undermines the entire purpose of the assignment. The architecture has to defend integrity by default.
-
Does the platform build AI literacy — teaching students how AI works, when to use it, when not to, and how to evaluate its outputs critically?
Why it matters: AI literacy is the new digital literacy. The tool itself should be part of the curriculum.
-
How is student work product attributable — what evidence does the platform retain that a student's thinking, not the AI's, produced the final artifact?
Why it matters: A teacher cannot grade what they cannot see. The platform should make student thinking visible, not hidden behind a chat interface.
Equity, Fairness & Uneven Outcomes
-
How does the AI avoid creating unfair advantages or widening achievement gaps across student subgroups?
Why it matters: Generic AI tools risk widening every gap they touch. The vendor must show that they have tested for this, not just hoped it isn't happening.
-
What testing have you done to ensure the AI performs comparably for English learners, students with IEPs, and students from historically underserved communities?
Why it matters: Ask for specifics. Bias and quality drift across populations is a measurable phenomenon, and a serious vendor will have measured it.
-
How does the platform support MTSS, differentiation, and intervention — not just one-size-fits-all responses?
Why it matters: If the AI cannot differentiate, it cannot serve the students who most need it to.
-
How does the platform support students with IEPs and 504 plans — does it read and operate within their accommodations automatically?
Why it matters: Accommodations as a manual teacher task are accommodations that get forgotten. The platform should know.
-
Is your platform regularly audited (at a defined cadence) for biases and fairness in AI decision-making — and can you produce documentation of those audits and the results?
Why it matters: State AI addenda increasingly require vendors to certify regular bias audits with documented results. A vendor without an audit cadence has not measured what it claims.
-
What documented strategies do you implement to identify and mitigate discriminatory effects in AI decision-making — and can you produce examples of bias issues you have actually surfaced and remediated?
Why it matters: A vendor that has never found a bias issue has never looked. Ask for messy real examples, not a polished statement.
Multilingual Support & Accessibility
-
What languages does your platform support — for student-facing AI tutoring, teacher-facing tools, and content generation — and at what reading levels?
Why it matters: Language access is an equity issue. "Supports translation" is not the same as "provides instructional support in a student's home language."
-
Is speech-to-text, text-to-speech, and other accessibility support built into the platform — not bolted on as an add-on or third-party integration?
Why it matters: Accessibility built in is durable. Accessibility bolted on breaks under load.
-
Can you provide a current VPAT (Voluntary Product Accessibility Template) — and what level of WCAG 2.1 AA and Section 508 conformance does it document?
Why it matters: VPAT is the industry standard accessibility disclosure. A vendor without a current VPAT (within the last 12 months) has not done the work, or has done it and won't show it.
-
Has your platform been tested with the assistive technologies your students already use — VoiceOver, JAWS, NVDA, screen magnifiers, switch access — and what gaps were identified and remediated?
Why it matters: WCAG compliance is the floor. The real question is whether the tool actually works for the students who depend on assistive technology.
Administrator & District Leadership Tools
-
What dedicated workflows exist for principals, curriculum directors, and superintendents — distinct from the teacher-facing tools?
Why it matters: A teacher tool sold to leaders is not a leadership tool. Districts need their own surfaces, not a different login to the same interface.
-
Does the platform support school improvement plan development aligned to state requirements and evidence-based research?
Why it matters: School improvement is one of the highest-stakes documents a district produces. If the AI can't help here, it isn't supporting district leadership.
-
Can the platform produce school-level and district-level AI impact reports — including which teachers are using it, how, and connected to student outcomes?
Why it matters: If you cannot see AI usage and connect it to student growth, you cannot defend the AI investment to your board.
-
Does the platform include enrollment forecasting, budget allocation, and resource planning tools that connect AI usage and outcomes to district priorities?
Why it matters: Strategic planning is where leadership lives. AI that stops at the classroom door is not a leadership platform.
-
Can the platform support communications to parents, board members, and the public — board reports, parent-facing summaries, community updates — grounded in district data?
Why it matters: Communication is leadership work. AI that can defend itself to a board in the leader's voice is a different category of tool.
Teacher Coaching, Feedback & Evaluation
-
Does the platform provide structured feedback to teachers on their instructional practice — and connect their AI usage patterns to student growth?
Why it matters: AI usage by itself is a vanity metric. AI usage tied to outcomes is professional development.
-
Can the platform support teacher evaluation or professional growth cycles aligned to our district's instructional framework — Danielson, Marzano, or a local framework?
Why it matters: If the AI is silent on the instructional rubric the district has already adopted, it is operating outside the district's existing systems.
-
Can principals and instructional coaches see classroom-level AI usage patterns and use them to plan coaching cycles?
Why it matters: Coaching at scale requires data. The platform should be feeding instructional leadership, not running parallel to it.
Implementation, Training & Long-Term Partnership
-
What does implementation actually look like — is this a classroom-by-classroom install, an individual-teacher signup, or a true institutional rollout across roles?
Why it matters: Tools that only deploy classroom-by-classroom never become district infrastructure. They stay anecdotal forever.
-
Does the platform support the full set of K-12 use cases — personalized learning, instructional support, automated grading and feedback, learning gap identification, special education, engagement, administrative support, parental engagement — or are you fundamentally a single-use tool that will force us to layer other AI tools to fill the gaps?
Why it matters: Most district AI addenda now require vendors to itemize every purpose of AI use across the platform. A single-purpose vendor will quickly become five overlapping vendors and a procurement and privacy nightmare.
-
What does training look like — fully self-serve, lightly-supported, or backed by a dedicated implementation team that knows education?
Why it matters: Self-serve training works for some districts. For most, the difference between adoption and shelfware is the depth of the vendor's training team.
-
What integrations are supported with our LMS, SIS, SSO, and identity systems — and how are those integrations maintained over time?
Why it matters: A demo that does not integrate with your stack is a demo, not a deployment.
-
What does ongoing partnership look like — year two, year three? How do you support districts in deepening instructional impact, not just product support?
Why it matters: A vendor whose support function ends at Tier 1 ticketing will not be a strategic partner. Ask what later years look like.
-
What is your process for incorporating district feedback into product development — and what concrete features in your roadmap came from district leaders?
Why it matters: A vendor who builds with districts will name the districts and the features. A vendor who builds for them will not.