Grok Agent

A minimal AI agent framework I built that runs in your terminal. Uses xAI's grok-4-1-fast model which is incredibly cheap for daily use. The magic is in the JSON-controlled loops - the agent reasons about whether it's done using structured output with confidence scores and exit conditions. Simple to set up, works remarkably well.

View on GitHub

Free

Features

Minimal but effective - a tight agentic loop that just works
grok-4-1-fast is dirt cheap ($2/M in, $10/M out) - perfect for daily use or background tasks
2M token context window - 10x larger than Claude or GPT
JSON-controlled loops - structured output with confidence, thinking, and exit conditions so the agent knows when to stop
Skills system that actually works - more enforced than Claude Code's approach, so skills are consistently used instead of randomly ignored
Tool calling that's reliable and fast
Two modes: clean minimal output or verbose debugging to see the agent's reasoning
Built to be extended - bare bones by design

Setup

1Clone: git clone https://github.com/ali-abassi/grok-agent.git
2Get API key from https://console.x.ai
3Set up: cp .env.example ~/.grok/.env and add your key
4Install: cp grok grok-verbose ~/.local/bin/
5Run: grok (minimal) or grok-verbose (debug mode)

How It Works

The Core Idea

The agent is a loop. User input goes in, the model reasons about what to do, executes tools if needed, and decides whether it's done. The key insight is that the model itself controls the loop through structured JSON output - it's not external logic deciding when to stop, it's the model reasoning about its own completion state.

The JSON Contract

Every response from the model is valid JSON with a specific structure. The model must output this format - no free-form text allowed. This is enforced via the API's response_format parameter.

{
  "thinking": "Brief strategic reasoning",
  "task": "Current action (5 words max)",
  "goal": "High-level objective",
  "exit_conditions": ["Condition 1", "Condition 2"],
  "progress": {
    "completed": ["Done items"],
    "current": "Working on now",
    "remaining": ["Still to do"]
  },
  "tool_calls": [{"tool": "bash", "args": {"command": "ls"}}],
  "response": "Final answer (null if calling tools)",
  "confidence": 85,
  "done": false
}

The Loop Logic

When a prompt comes in, the agent: (1) injects the list of available skills into the context, (2) sends the full message history to the Grok API with JSON mode enabled, (3) parses the JSON response, (4) if done=false and tool_calls exist, executes each tool and appends results to the conversation, then loops back to step 2, (5) if done=true, displays the response and waits for the next user input. The model controls flow entirely through the done flag.

Exit Conditions & Confidence

On the first response, the model sets goal and exit_conditions - the specific conditions that must be met to consider the task complete. Each iteration, it updates progress to track what's done and what remains. The confidence score (0-100) indicates certainty: 95+ means tool-verified fact, below 80 means the model should use a tool to verify. The model only sets done=true when all exit conditions are met.

The Skills System

Skills are markdown files in a skills directory. On every message, the agent injects a list of available skills into the prompt with a hint: '[INTERNAL: Skills available. Load silently if relevant. Never mention to user.]' When the model sees a relevant skill, it uses read_file to load it, then follows the instructions. The user never sees this - they just get better results. This is more enforced than Claude Code's approach where skills tend to be inconsistently applied.

Available Tools

The agent has six core tools: bash (run shell commands with 3min timeout), web_search (search via Grok with grounding), read_file (read file contents), write_file (create/write files), list_files (directory listing), and ask_user (prompt for clarification with optional choices). Tools can be chained - the model often does search → read → write → confirm in sequence.

Why This Works

The grok-4-1-fast model is surprisingly good at structured output and tool calling. The JSON contract forces the model to reason explicitly about its state - it can't just ramble, it has to declare its thinking, current task, and whether it's done. The exit conditions pattern prevents infinite loops and gives clear stopping criteria. And because the model is cheap ($2/M input, $10/M output), you can run it continuously for background work without worrying about cost.