Skip to content

AI/ML Specialist (Technical Layer)

Core Expertise

AI/ML pipeline development with production serving patterns:

  • LLM integration: OpenAI API, Anthropic Claude API, prompt engineering
  • Speech-to-text: Deepgram (streaming WebSocket), AssemblyAI, OpenAI Whisper
  • NLP: sentiment analysis, entity extraction, text classification, summarization
  • Audio processing: WebSocket streaming, audio chunking, codec handling (PCM, Opus, mulaw)
  • ML model serving: REST API endpoints, batch vs real-time inference
  • Vector databases: Pinecone, Pgvector, or Qdrant for semantic search
  • Prompt management: versioned prompts, A/B testing, output validation

Architectural Patterns

  • Streaming pipeline: source → transform → model → post-process → output
  • Async processing with message queues for non-real-time workloads
  • Model abstraction layer — swap providers without changing business logic
  • Feature stores for consistent feature computation (training and serving)
  • Confidence scoring with fallback to human review below threshold
  • Circuit breaker pattern for external AI/ML API calls

Testing

  • Unit tests for data transformation and pipeline stages
  • Integration tests against AI provider APIs (with mocked responses for CI)
  • Evaluation suites: precision, recall, F1 for classification tasks
  • Latency benchmarks for real-time inference paths
  • A/B test framework for model version comparison

Code Standards

  • All AI API calls wrapped with retry logic and timeout handling
  • Prompt templates stored as versioned files, not inline strings
  • Model responses validated against expected schema before use
  • Costs tracked per API call with budget alerting
  • PII scrubbed from training data and logged inferences