Location: Jersey City/ Remote
Experience: 6+ years
Employment Type: Contract
Position Overview
We are seeking an experienced AI/ML Data Scientist to architect, deploy, and maintain advanced machine learning systems at scale. The ideal candidate is highly proficient in distributed training, MLOps, and ML model serving infrastructure, and will play a key role in shaping and scaling enterprise AI platforms.
This is a hands-on engineering role focused on deep learning, containerized ML deployment, and real-time inference systems in a cloud-based environment.
Key Responsibilities
- Architect and implement distributed training strategies using frameworks like Horovod and DeepSpeed.
- Design and deploy containerized ML models via Docker, Kubernetes, and modern serving platforms like TensorFlow Serving, TorchServe, and Seldon Core.
- Develop robust model monitoring, drift detection, and logging systems to ensure reliability in production.
- Lead CI/CD pipeline development and enforce MLOps best practices for rapid and safe model iteration.
- Optimize inference pipelines for low-latency model serving across real-time and batch workloads.
- Integrate models with distributed data storage platforms and vector databases to support scalable feature retrieval.
- Troubleshoot and debug complex distributed AI/ML systems for performance and scalability.
- Collaborate across data engineering, DevOps, and product teams to deliver business-ready AI solutions.
Must-Have Qualifications
- 6–10 years of experience in AI/ML, with deep expertise in training and deploying ML models in production.
- Strong Python skills with experience in NumPy, SciPy, and machine learning libraries.
- Proficient in deep learning frameworks: TensorFlow, PyTorch, and associated ecosystems.
- Experience with data orchestration tools such as Airflow, Kubeflow, or similar.
- Hands-on knowledge of feature engineering platforms (e.g., Feast, Tecton).
- Solid background in distributed computing using Apache Spark, Dask, or similar platforms.
- Strong experience with containerization and orchestration via Docker and Kubernetes.
- Familiarity with model serving frameworks: TF Serving, TorchServe, or Seldon Core.
- Proven ability to implement model monitoring and concept drift detection pipelines.
- In-depth understanding of data formats and serialization: Parquet, Avro, Protocol Buffers.
Preferred Qualifications
- Prior experience deploying AI/ML platforms in insurance, finance, or healthcare environments.
- Knowledge of vector search and real-time feature stores for high-performance retrieval.
- Exposure to multi-cloud or hybrid cloud ML infrastructure.
- Contributions to open-source AI/ML tooling is a plus.
Application Process:
Submit your updated resume and a brief portfolio or summary highlighting your AI/ML deployments, infrastructure contributions, and specific experience with distributed model training and serving.