Design and ship agentic systems (tool calling, multi-agent workflows, structured outputs) that reliably fetch, extract, and normalize data across the web and APIs.
Build and operate search/indexing pipelines on OpenSearch/Elasticsearch (schema design, analyzers, reindex/data migration strategies, relevance tuning).
Own robust web scraping: directory crawling, CAPTCHA handling, headless browsers, rotating proxies, anti-bot evasion, and backoff/retry policies.
Develop backend services in Python + FastAPI with clean contracts and strong observability.
Integrate third-party APIs for enrichment and search; model and cache responses; manage schema evolution.
Transform and analyze data using Pandas (or similar) for normalization, QA, and reporting.
Pitch in across the stack: billing (Stripe), and occasional front-end changes to ship end-to-end features.
Minimum requirements
Hands-on experience with agentic architectures (tool calling, structured outputs/JSON, planning/execution loops) and prompt engineering.
Deep knowledge of OpenSearch/Elasticsearch: index design, analyzers, ingestion pipelines, snapshots, rolling upgrades, and zero-downtime reindexing/data migrations.
Proven web scraping expertise: solving CAPTCHAs, session/auth flows, proxy rotation, stealth techniques, and legal/ethical constraints.
AWS + Docker in production (at least two of: ECS/EKS, Lambda, SQS/SNS, Batch, Step Functions, CloudWatch).
Building high-throughput data/IO pipelines with concurrency (asyncio/multiprocessing), resilient retries, and rate-limit aware scheduling.
Integrating diverse external APIs (auth patterns, pagination, webhooks); designing stable interfaces and backfills.
Strong data wrangling with Pandas or equivalent; comfort with large CSV/Parquet workflows and memory/perf tuning.
Familiarity with Stripe (subscriptions, metered billing, webhooks) and basic front-end changes (React/TypeScript or similar).
Excellent ownership, product sense, and pragmatic debugging.
Nice to have
Entity resolution/record linkage at scale (probabilistic matching, blocking, deduping).
Experience with Langfuse, OpenTelemetry, or similar for tracing/evals; task queues (Celery/RQ), Redis, Postgres.
Search relevance (BM25/vector/hybrid), embeddings, and retrieval pipelines.