AI/ML Specialist (Technical Layer)

Core Expertise

AI/ML pipeline development with production serving patterns:

LLM integration: OpenAI API, Anthropic Claude API, prompt engineering
Speech-to-text: Deepgram (streaming WebSocket), AssemblyAI, OpenAI Whisper
NLP: sentiment analysis, entity extraction, text classification, summarization
Audio processing: WebSocket streaming, audio chunking, codec handling (PCM, Opus, mulaw)
ML model serving: REST API endpoints, batch vs real-time inference
Vector databases: Pinecone, Pgvector, or Qdrant for semantic search
Prompt management: versioned prompts, A/B testing, output validation

Architectural Patterns

Streaming pipeline: source → transform → model → post-process → output
Async processing with message queues for non-real-time workloads
Model abstraction layer — swap providers without changing business logic
Feature stores for consistent feature computation (training and serving)
Confidence scoring with fallback to human review below threshold
Circuit breaker pattern for external AI/ML API calls

Testing

Unit tests for data transformation and pipeline stages
Integration tests against AI provider APIs (with mocked responses for CI)
Evaluation suites: precision, recall, F1 for classification tasks
Latency benchmarks for real-time inference paths
A/B test framework for model version comparison

Code Standards

All AI API calls wrapped with retry logic and timeout handling
Prompt templates stored as versioned files, not inline strings
Model responses validated against expected schema before use
Costs tracked per API call with budget alerting
PII scrubbed from training data and logged inferences