Location: Santa Clara, CA
Experience: 8+ Years
Employment Type: Full-time / Contract
Position Overview:
NovaSoft is seeking a highly experienced Databricks Data Engineer with strong DevOps expertise to design, implement, and optimize large-scale Lakehouse architectures on AWS.
This role requires deep architectural understanding of compute vs. serving layer separation, low-latency data/API access strategies, and multi-terabyte data processing. The ideal candidate combines hands-on engineering excellence with technical leadership — a true player-coach mindset.
You will work closely with cross-functional teams to build scalable, secure, automated, and high-performance data platforms using modern DevOps practices.
Key Responsibilities:
- Design and implement scalable Databricks Lakehouse architectures on AWS
- Build and optimize ETL/ELT pipelines using PySpark, Spark, and SQL
- Implement Delta Lake best practices (partitioning, optimization, schema evolution)
- Develop and manage CI/CD pipelines and automated deployments using DevOps tools
- Optimize Spark workloads for performance, cost efficiency, and low-latency access
- Implement data governance and security using Unity Catalog
- Collaborate with cross-functional teams and provide technical leadership
Technical Skills (Required):
- Strong hands-on experience with:
- Databricks (Delta Lake, Unity Catalog, Delta Live Pipelines, Workflows, Runtime)
- PySpark, Spark, Advanced SQL
- Lakehouse & Medallion Architecture
- AWS expertise including:
- S3, IAM, Glue / Glue Catalog
- Lambda
- Secrets Manager
- (Kinesis is a plus)
- DevOps expertise:
- Git-based workflows
- CI/CD pipelines
- Databricks Asset Bundles
- Terraform (preferred)
- Experience handling multi-terabyte workloads
- Strong understanding of performance tuning, partitioning, and storage optimization
Preferred Experience:
- Structured Streaming / real-time data pipelines
- Advanced Databricks runtime configuration
- Real-time or near real-time data solutions
- Exposure to GitLab CI/CD pipelines
Certifications (Optional)
- Databricks Certified Data Engineer (Associate / Professional)
- AWS Certified Data Engineer or Solutions Architect