Position Statement
This page describes how we use AI at Polaris, what happens to student data, where humans review what the system produces, and what we still haven't figured out. It assumes you're skeptical about AI in schools. We think that's a reasonable place to start.
If you have questions after reading, please get in touch through our contact page.
Where the model sits in the listening pipeline
Student speech
What gets recorded
What the model does
Cleans the audio, finds patterns
What adults receive
Read the report, listen to the audio
The model handles the volume so adults can spend their time listening and learning.
Most schools are full of conversations that should happen but don't. Teachers don't have time to sit with every student about what's working in their classes. Administrators don't usually hear, in any granular way, how the schedule change played out in seventh grade. Students don't get asked a serious open-ended question about their school and given the room to answer it. A 2025 study by Conner and colleagues found that opportunities for students to weigh in on the decisions that shape their day-to-day school experience are rarely measured with validated instruments Conner et al. (2025). We see AI here as a way to make more of those conversations possible, not as a replacement for them.
The product is voice-first because the act of a student speaking to someone (a teacher, a peer, the room) is the thing we want more of. Listening sessions are designed for group discussion, where students answer the prompt with each other rather than into a form. The recording is the artifact; the conversation is the point.
The model sits between the audio and the people who need to hear it. A district running fifty listening sessions across three schools generates more student speech than any superintendent will ever sit with directly. Polaris summarizes, surfaces patterns, and gets specific student quotes in front of the adults who need to hear them, with the original audio one click away. Every quote in a published report links back to the actual recording. The report is a map; the recordings are the territory.
Some ed-tech vendors pitch AI as a closed loop: the system reads student input, decides what it means, and surfaces conclusions to the school without anyone in between. We think that's the wrong design. A substantial body of scholarship on automated decision-making in schools argues that when automation goes unchecked, it tends to entrench existing inequities rather than expose them Selwyn (2019) Williamson et al. (2020) Selbst et al. (2019). The risk isn't that the model is occasionally wrong; the risk is that wrongness becomes invisible at scale. The EU AI Act, which entered into force in August 2024 and reaches full application in August 2026, classifies education-facing AI as high-risk and mandates human oversight as a default requirement European Parliament and Council (2024). Polaris was built around that principle before the regulation existed.
Polaris is built around the assumption that a model's output is a draft, not a verdict. The current human review checkpoints in our pipeline:
Research on human-in-the-loop machine learning makes the same point: human review is only valuable when the humans involved have real authority and real time to use it Mosqueira-Rey et al. (2023). Review that exists on a checklist but gets rushed past in production is theater. We try, imperfectly, to make the review steps load-bearing.
Two columns of commitments, with the contract behind each
Stays inside Polaris
Encrypted, anonymized, school-controlled
Never happens
Prohibited by charter or vendor contract
Each item on the right is locked into either our nonprofit charter or our commercial contracts with model vendors.
Polaris Education is operated by Human Restoration Project, a nonprofit organization. Our charter prohibits the sale or transfer of student data to third parties, and that prohibition survives dissolution, merger, or acquisition. There is no commercial configuration in which student records leave the platform for another buyer. The full text is in our privacy policy.
Student responses are not used to train AI models. The model providers we work with, including Anthropic for language model inference and ElevenLabs for voice processing, offer commercial agreements that exclude customer inputs and outputs from model training. This is the standard commercial term, not a custom carve-out, and it applies to every interaction Polaris makes with those vendors on behalf of a school. We retain the right to audit those agreements and renegotiate if a vendor changes terms in a way that puts student data at risk.
Our pipeline strips personally identifiable information out of every interview before the data lands in long-term storage or reaches a teacher's screen. That includes student names, adult names, addresses, and identifying detail combinations. Voice anonymization is available on request, so even the acoustic record can be processed without leaving the original speaker recognizable. PII detection isn't perfect; we publish the cases we've caught and the categories where slips still happen so districts can see what they're working with.
Polaris is FERPA, COPPA, and CIPA compliant, and was rated 95% by Common Sense Privacy in their independent evaluation. Customizable Data Processing Agreements are available for any district that needs one. Where the EU AI Act applies, education-facing AI is classified as high-risk, with provider obligations on data governance, transparency, and human oversight, and specific bans on emotion-inference systems in education that took effect in February 2025 European Parliament and Council (2024) Maslej et al. (2025).
Computational analysis of qualitative material has been around for decades, well before the current generation of large language models. Conversations about AI in education sometimes collapse the whole history into a single event, the public release of ChatGPT in late 2022, and miss what is actually happening underneath. Even as the field has kept changing in important ways since 2022, the methodological core of NLP still rests on statistical foundations laid decades earlier — a continuity the EU AI Act itself recognizes by defining "AI system" broadly enough to cover both older statistical techniques and current language models European Parliament and Council (2024).
The methodological foundation is older still. Glaser and Strauss's 1967 work on grounded theory established the tradition of systematically coding open-ended responses to surface emergent patterns Glaser et al. (1967). The first generation of computer-assisted qualitative data analysis software (CAQDAS) emerged in the late 1980s and the 1990s, well before modern machine learning was practical at scale. Tools like NUD*IST, ATLAS.ti, and NVivo come from that era.
On the statistical side, latent semantic analysis (LSA) was published in 1990 and remains a working technique for modeling word-document associations Deerwester et al. (1990). Latent Dirichlet Allocation (LDA), the topic-modeling workhorse of the 2000s and 2010s, was published in 2003 Blei et al. (2003). Word embeddings, which let a vector encode semantic similarity, were popularized by word2vec in 2013 Mikolov et al. (2013). Structural topic models extended LDA with covariates, so researchers could ask how a theme's prevalence varies across populations Roberts et al. (2019). None of this required a language model in the contemporary sense.
Polaris uses both lineages. Vector embeddings, density-based clustering (HDBSCAN), and direct frequency counting do most of the structural work that turns thousands of student turns into a tractable set of themes. Large language models are useful at the labeling and synthesis stages (naming a cluster, drafting a writeup, suggesting where two themes might be merged), but they sit on top of older statistical machinery, not in place of it. When we describe Polaris as "AI-powered," we mean a stack that includes 1990s information-retrieval techniques alongside 2024-era language models, with each used where it's honest to use it.
Skepticism about language model output is the right default. Hallucination, where a model confidently generates content the input doesn't support, is well-documented at this point Ji et al. (2023) Kalai et al. (2025). Recent work from OpenAI traces the persistence of hallucination to the way language models are trained and evaluated: standard benchmarks reward a confident wrong answer over an honest "I don't know" Kalai et al. (2025). Bender and colleagues' 2021 "stochastic parrots" paper raised structural concerns about how training data gets curated and what fluent-but-wrong output costs the people who read it, and those concerns still hold Bender et al. (2021). Buolamwini and Gebru's study of commercial face-classification systems demonstrated the depth of demographic bias in deployed AI systems Buolamwini et al. (2018). The Model Cards proposal argued that any deployed AI system should ship with public documentation of its evaluation, intended use, and known failure modes Mitchell et al. (2019). We agree with the substance of these critiques. A 2025 head-to-head study of LLM-led versus human-led thematic analysis found that the LLMs hallucinated in ways that ranged from single-phrase changes to truncations and recombinations that altered what students actually said. The authors concluded that LLMs aren't yet a substitute for experienced qualitative researchers Mehta et al. (2025). This matches our experience.
Skepticism doesn't require abstinence. Language models are useful for specific, narrow jobs (transcript cleaning, theme labeling, drafting prose) when their output is treated as a draft and verified against ground truth before it leaves the system. The concrete safeguards we use:
Any single model's output is a guess with a confidence number attached. The job of a serious research tool is to design the system around that guess so it stays checkable, replaceable, and recoverable when wrong, instead of wrapping it in interface polish and calling it an answer.
This list isn't exhaustive. Each item is a known limitation of the current product.
We try to run our infrastructure on renewable energy wherever the underlying provider supports it.
Polaris runs on DigitalOcean's NYC3 servers, which sit inside a New York-area data center run by a company called Equinix. Equinix reports that 96% of the electricity powering its data centers globally comes from renewable sources in its 2024 sustainability report — the eighth year in a row above 90%. DigitalOcean itself doesn't publish a breakdown by region, so this is the most specific thing we can honestly say about the energy mix behind the building our servers live in.
GPU work — transcription, embeddings, and the heavier compute behind theme analysis — runs serverlessly on Modal. Modal doesn't publish a sustainability page of their own, but their primary infrastructure partner is Oracle Cloud, and Oracle reports matching 100% of its global electricity use with renewable energy in its 2025 fiscal year (including the US East region in Ashburn, Virginia, where most of these jobs actually run). "Matched with renewable energy" isn't the same as 24/7 carbon-free power — Oracle is buying renewable credits and PPAs that equal their total consumption — but it's a real and audited claim. On top of that, because Modal scales containers all the way down to zero between jobs, we only draw power during the seconds a transcription or embedding is actually running, not around the clock like a dedicated GPU would. For inference against third-party language models, we batch requests and cache results so the same content isn't processed twice.
We don't pretend any of this is carbon-free. Training and running large neural networks has measurable energy and water costs, and there's a substantial body of academic work on it Strubell et al. (2019) Patterson et al. (2021). The IEA's April 2025 Energy and AI report projects that global data center electricity consumption will roughly double to about 945 TWh by 2030, and identifies AI inference at scale as a meaningful component of that growth International Energy Agency (2025). AI inference draws power, the grid is partly fossil-fueled, and the most we can do at our scale is choose smaller models when they're sufficient, prefer renewable-procuring providers, and avoid recomputing what we've already computed.
Skepticism about AI in schools is the right place to start. We also think these tools can do useful work when they're pointed at something specific: getting more people into honest conversation with each other, not fewer.
Our organization leans into a solarpunk philosophy: technology can serve people and the planet, and we can use fascinating technologies to better the world. Polaris is one small piece of that, a way to put student voice into the decisions schools actually make.
For the bigger picture, the Human Restoration Project publishes essays on systemic change in education and keeps a library of past conference sessions on the same questions.
Peer-reviewed and primary sources cited above. Where possible, links go to the publisher of record.