استخدام متخصص هوش مصنوعی
شرح موقعیت شغلی
Position: Distributed Systems Engineer
Requirements
- Must-Have:
- Deep knowledge of LLM inference (transformers, forward passes, KV cache, prefill/decode phases)
- Proficiency in Python (and ideally Rust, GO or C++ )
- Experience with GPU programming (CUDA, vLLM, Hugging Face, or similar inference engines)
- Experience with distributed systems and high-performance networking
- Strongly Preferred:
- Experience with decentralized systems, P2P networks, or blockchain-adjacent infrastructure
- Familiarity with model parallelism techniques (pipeline parallelism, tensor parallelism)
- Knowledge of compression algorithms for tensors/activations
- Experience with real-time routing, service mesh, or orchestration systems (Kubernetes, Ray, etc.)
- Strong systems thinking and performance engineering mindset
Key Responsibilities
- Design & Implement the Core Architecture:
- Model partitioning into executable blocks/layers across distributed nodes
- Latency-aware dynamic routing using real-time telemetry (latency, queue, GPU load, trust scores, etc.)
- Adaptive layer-based replication with cost-benefit logic
- Observable metrics pipeline (p95 latency, bottlenecks, failure rates, audit overhead)
- Build Production-Grade Features:
- Request gateway, route planner, aggregator
- Operator dashboards for visibility into routes, nodes, and blocks
- Benchmarking framework against naive pipelines
- Technical Direction:
- Start with smaller open-source models and scale to large models
- Define MVP success metrics and iterate rapidly
مهارتهای مورد نیاز
- هوش مصنوعی
- Python
- Pytorch
حداقل سابقه کار
- سه تا شش سال
جنسیت
- مهم نیست
وضعیت نظام وظیفه
- مهم نیست