Safety at OCNexus

OCNexus is an autonomous development pipeline. Every request is classified by a safety layer before any code is written or any LLM tokens are spent. This page documents what OCNexus will and won't build, and how the classifier works.

Design Principle: The classifier runs before the pipeline. If a request is blocked, no pipeline run starts, no tokens are consumed, and no cost is incurred.

Prohibited Categories

Requests in these categories are always blocked. The user receives a specific decline message explaining why and what to do if the block was incorrect.

BLOCKED competitive_product

Building an AI-powered development pipeline that automates issue-to-PR workflows — OCNexus's core product.

Decline message: "Building AI-powered development pipelines is outside what OCNexus processes — that's the core of what we do here. If you're building a different kind of automation tool, describe what it does and we'll take another look."
BLOCKED illegal_activity

Systems designed to facilitate fraud, unauthorized access, data theft, or any criminal activity.

Decline message: "This request describes functionality OCNexus won't build. See our usage policy for what's in and out of scope."
BLOCKED ai_policy_violation

Tools designed to circumvent Anthropic, OpenAI, or other AI provider usage policies.

Decline message: "This request involves circumventing AI provider usage policies, which OCNexus won't process. If you're building a legitimate AI integration, rephrase without the policy-bypassing aspect."
BLOCKED malware

Malicious code, vulnerability exploits, ransomware, spyware.

Decline message: "OCNexus won't build malicious software, exploits, or tools designed to damage systems. If this was blocked incorrectly, contact safety@ocnexus.dev with your project ID."
BLOCKED deceptive_systems

Applications designed to deceive users about their nature or harvest data without disclosure.

Decline message: "Applications designed to deceive users about their nature are outside OCNexus scope. If you're building a legitimate chatbot or automated system, make sure it discloses its nature."
BLOCKED destructive_operations

Requests to delete production data, destroy infrastructure, or cause irreversible harm.

Decline message: "Requests to delete production data or destroy infrastructure are blocked as a safety measure. If you need to perform a controlled data operation, describe the specific scope and safeguards."

Restricted Categories

Requests in these categories are not blocked but require human approval before the pipeline runs. The user sees a gate message and must explicitly approve.

GATED high_risk_financial

Changes to payment processing, payout logic, or subscription billing in production.

Gate message: "This request involves payment processing or billing changes. A human review of the spec is required before any code is written."
GATED auth_system

Modifications to authentication, authorization, or session management.

Gate message: "This request modifies authentication or authorization. A human review of the spec and implementation is required before merge."
GATED data_export

Any operation that exports or permanently deletes significant user data.

Gate message: "This request involves bulk data export or deletion. Human approval with a logged reason is required before proceeding."
GATED production_infrastructure

Railway, Supabase, DNS, or SSL changes in production environment.

Gate message: "This request changes production infrastructure (Railway, Supabase, DNS, or SSL). Human approval is required."

How the Classifier Works

The safety classifier is a lightweight Claude pass that runs before the main pipeline. It takes less than 2 seconds and costs a fraction of a cent per classification.

1
Request received

User submits an issue or chat message.

2
Safety classification

Claude classifies the request against prohibited and restricted categories. The request text is hashed (SHA-256) and logged.

3
Decision

Safe: Pipeline runs normally. Restricted: Human gate required. Prohibited: Request blocked with decline message.

4
Escalation

If an account triggers 5+ blocks in 24 hours, the account is flagged for manual review.


Appeals

If you believe a request was incorrectly blocked:

  1. Check the decline message — it often explains how to rephrase the request.
  2. Rephrase the request to clarify your intent. A fraud detection tool is not fraud facilitation.
  3. If the block persists, email safety@ocnexus.dev with your project ID and the request text.

We review appeals within 24 hours and update the classifier if the block was incorrect.


Transparency

All safety classifications are logged and visible in the OCNexus dashboard under Safety → Recent Logs. Each log entry includes:

  • The category that was triggered
  • Whether the request was blocked or gated
  • The classifier's reasoning
  • A SHA-256 hash of the request text (the full text is not stored)
  • Timestamp

We do not store the full text of classified requests. Only the hash is retained for audit purposes.

Last updated: February 21, 2026. This policy may be updated as we refine the classifier. Changes are logged in the OCNexus changelog.