آگهی‌های استخدامی

استخدام C++ AI Inference Engineer (HPC / Audio)

شرح موقعیت شغلی

We are looking for a highly skilled C++ AI Inference Engineer to bridge the gap between cutting-edge machine learning research and high-performance production systems. In this role, you will be responsible for taking complex PyTorch models and re-implementing, optimizing, and deploying them using high-performance inference backends such as NVIDIA TensorRT (nvinfer), ONNX Runtime, and GGML.

You won’t just be calling APIs; you will be writing highly optimized C++ code, leveraging High-Performance Computing (HPC) algorithms, and squeezing every drop of performance out of CPUs and GPUs. If you are obsessed with memory alignment, cache locality, SIMD instructions, and making AI models run in real-time, we want you on our team.

### Key Responsibilities
   *Model Re-implementation & Porting: Translate and reimplement complex PyTorch models into highly optimized, production-ready C++ inference pipelines.
   *Backend Integration: Architect and integrate inference engines using TensorRT (nvinfer) for GPU acceleration, ONNX Runtime for cross-platform compatibility, and GGML for highly optimized CPU/quantized inference.
   *HPC Optimization: Apply High-Performance Computing algorithms and techniques (SIMD/AVX, multithreading, memory pooling, lock-free data structures) to minimize latency and maximize throughput.
   *Profiling & Bottleneck Resolution: Use advanced profiling tools (NVIDIA Nsight, perf, VTune, Valgrind) to identify memory and compute bottlenecks, writing custom CUDA/C++ kernels when necessary.
   *Quantization & Compression: Implement advanced quantization techniques (INT8, FP8, INT4) to reduce model footprint while maintaining accuracy, particularly utilizing GGML's quantization formats.
   *Cross-Functional Collaboration: Work closely with ML Researchers to ensure models are designed with inference efficiency in mind, and with Backend Engineers to seamlessly integrate the inference engine into our core product.

### Required Qualifications
   *Expert-Level C++: Deep mastery of Modern C++ (C++14/17/20). You understand memory management at the hardware level, template metaprogramming, and zero-cost abstractions.
   *HPC & Systems Programming: Strong background in High-Performance Computing. Experience with multi-threading, concurrency, CPU architecture, cache optimization, and SIMD (AVX2/AVX-512/NEON) vectorization.
   *Inference Framework Mastery: Hands-on, deep experience deploying models using TensorRT (nvinfer), ONNX/ONNX Runtime, and GGML / llama.cpp. You understand the internal graph optimization and execution strategies of these engines.
   *Deep Learning Fundamentals: Solid understanding of how PyTorch works under the hood (autograd, tensor operations, custom C++ extensions).
   *GPU Programming: Strong proficiency in CUDA programming and understanding of GPU memory hierarchies and warp-level primitives.
   *Math & Algorithms: Strong foundation in linear algebra, numerical methods, and HPC algorithms (e.g., optimized GEMM, FFT, convolution algorithms).

### Preferred Qualifications (Bonus Points)
   *Audio Processing Experience:* Familiarity with Digital Signal Processing (DSP), real-time audio pipelines, FFT, or audio-specific ML models (e.g., Whisper, AudioCraft, Voice Conversion, Noise Suppression). This is a massive plus for our specific use case.

   Custom Kernel Development: Experience writing custom CUDA kernels or C++ operators for PyTorch/ONNX to replace inefficient default operations.
   *Edge/Embedded AI: Experience deploying models on edge devices, mobile (CoreML, Android NNAPI), or resource-constrained environments.
   *Distributed Inference: Experience with tensor parallelism, pipeline parallelism, or vLLM/TensorRT-LLM for large-scale model serving.

### Tech Stack You’ll Work With
   *Languages:* C++ (Primary), CUDA, Python (for tooling/research).

   Inference Engines: TensorRT, ONNX Runtime, GGML, llama.cpp.
   *ML Frameworks: PyTorch, LibTorch.
   *Tools: CMake, Conan/vcpkg, NVIDIA Nsight, GDB, perf, Git.

### What We Offer
   Competitive base salary and equity package.

   [Health, dental, and vision insurance].
   [Flexible PTO and remote work options].

   Top-tier hardware budget (Latest GPUs, workstations, etc.).
   The opportunity to work on the bleeding edge of AI inference and HPC.

مهارت‌های مورد نیاز

  • C++
  • hpc
  • Python

حداقل سابقه کار

  • مهم نیست

جنسیت

  • مهم نیست

وضعیت نظام وظیفه

  • مهم‌ نیست

نوع همکاری:

تمام وقت

تاریخ انتشار آگهی:

۱۴۰۵/۰۳/۳۱
ارسال رزومه