OUR PROJECT
The NVIDIA Triton Deployment Engineer will support the deployment of NVIDIA-accelerated AI models to edge devices in regulated, real-world environments. This role focuses on production-grade model deployment, automation, and lifecycle management, rather than research or model development. You will own the end-to-end process of converting, optimizing, packaging, deploying, and updating trained AI models on edge systems using NVIDIA Triton Inference Server and AWS-based automation.
This is a contract position that will be 100% remote and you can expect very competitive pay.
WHO WE ARE LOOKING FOR
We are looking for a NVIDIA Triton Deployment Engineer with deep experience operationalizing AI models in GPU and edge environments. The ideal candidate is comfortable managing the full deployment lifecycle—from model optimization to edge delivery—while automating workflows in the cloud. You have a strong understanding of Triton, TensorRT/ONNX integration, and GPU inference, and can collaborate effectively with ML, platform, and systems teams to deliver production-ready AI solutions.
We are interviewing qualified candidates immediately and will move into the offer stage quickly. If you are interested, please apply with an updated resume.
QUALIFICATIONS
- Hands-on experience with NVIDIA Triton Inference Server, including repository structure, ensembles, versioning, TensorRT, and ONNX.
- Proven experience deploying NVIDIA-accelerated models to edge devices.
- Experience converting, optimizing, packaging, and deploying AI models for production inference.
- Experience automating deployments using AWS (IAM, VPC, S3, KMS).
- Strong understanding of GPU-based inference and production deployment considerations.
Effective written and verbal communication skills are absolutely required for this role. You must be able to work LEGALLY in the United States as NO SPONSORSHIP will be provided. NO 3rd PARTIES.