// ~/services/ai-ml

custom AI & ML.

Fine-tuned LLMs, RAG systems, agents, vision models. Built end-to-end, deployed to your stack, and yours to own. Beyond off-the-shelf APIs.

price	custom-quoted by scope (fixed, not hourly)
turnaround	typically 2-8 weeks
delivery	code + weights in your repo / cloud
nda	happy to sign — yours or mine
models	Llama, Mistral, Qwen (open) · Claude, GPT, Gemini (hosted)

// capabilities

seven capabilities. most builds combine two or three.

[01]llm fine-tuning// teach an open-weights base your domain, tone, and tasks
[02]RAG & retrieval// vector search, hybrid, citations — grounded in your docs
[03]agents & orchestration// tool use, planners, guardrails, multi-step workflows
[04]computer vision// detect, segment, classify — photos, scans, live video
[05]api & deployment// FastAPI / Modal / Vercel · auth, observability, rate limits
[06]evals & quality// regression tracking, online + offline eval pipelines
[07]cost & latency tuning// right-sized models, quantization, caching

// process

[01] discovery

map the problem, success criteria, data audit. one-page brief out at the end.

[02] data & design

collect, clean, label. pick architecture and the eval that says we're done.

[03] build & train

iterate on data/prompts, fine-tune, run evals weekly until quality crosses bar.

[04] deploy & monitor

ship to your stack with logging and eval-on-prod. clean handoff included.

// faq

Q: bring my own data?

A: helpful, not required. if you have it, we'll use it. if not — open dataset, synthesis, or a small collection phase.

Q: timeline?

A: 2-8 weeks end-to-end. focused RAG ships in 2; fine-tune with custom evals is 4-6.

Q: hosting & ongoing costs?

A: I deploy to your cloud so you own runtime. inference cost depends on model size + traffic — right-sized before shipping.

Q: open vs hosted models?

A: open-weights (Llama, Mistral, Qwen) when you want to self-host. hosted (Claude, GPT, Gemini) when latency/quality demands. often both.

Q: do I own everything?

A: yes. code, weights, fine-tuned artifacts ship to a repo + storage bucket you control.

// start a project

Send a paragraph. I'll respond within 24 hours.