Data Infrastructure Engineer
Salary:
$240,000-280,000 - Per Annum
Locations:
San Jose, San Jose, CA, United States
Type:
Permanent
Published:
October 13, 2025
Contact:
Cassi Benson
Ref:
18788
Required Skills:
Python
Share this job
Apply

Job Title: Data Infrastructure Engineer
Salary: $240K – $280K + Equity
Location: San Francisco, CA (Hybrid)

A leading AI company developing large-scale training systems for multimodal data is hiring a Data Infrastructure Engineer to build and optimize distributed pipelines powering next-generation model training. In this role, you’ll design and operate data systems that handle massive structured and unstructured datasets, ensuring high-quality, scalable, and reliable data delivery for AI research and production.

You’ll work across ingestion, transformation, and storage, collaborating closely with ML researchers to prepare and manage diverse datasets for pre-training and fine-tuning. This is an opportunity to own core data infrastructure end-to-end, shaping the backbone of advanced AI pipelines in a fast-moving, research-driven environment.

Responsibilities

  • Design and maintain distributed data ingestion and transformation pipelines for structured and unstructured datasets.

  • Build scalable ETL/ELT workflows leveraging modern frameworks (Spark, Dask, Ray, or Flink).

  • Architect and optimize data storage across cloud environments (AWS, GCP, Azure) and lakehouse systems (Delta Lake, Parquet, etc.).

  • Implement validation, monitoring, and observability to ensure data quality and reliability.

  • Partner with ML and research teams to support dataset preprocessing and scaling for training workflows.

  • Use IaC and CI/CD tools (Terraform, Kubernetes, Airflow, Prefect, etc.) to create reproducible, automated environments.

Requirements

  • 5+ years of experience in data infrastructure or distributed systems.

  • Strong programming skills in Python (plus SQL; Scala/Java/C++ a plus).

  • Proven experience building large-scale data pipelines and orchestration workflows.

  • Familiarity with unstructured data processing

  • Experience with modern cloud and lakehouse architectures (e.g., S3, Databricks, Snowflake).

  • Excellent problem-solving ability and comfort operating in fast-paced, collaborative startup settings.

Benefits

  • Startup Equity

  • Health, Dental, and Vision Insurance.

  • 401(k) plan.

  • Opportunities for professional growth in a fast-evolving AI environment.
     

Accessibility Statement

Read and apply for this role in the way that works for you by using our Recite Me assistive technology tool. Click the circle at the bottom right side of the screen and select your preferences.

We make an active choice to be inclusive towards everyone every day. Please let us know if you require any accessibility adjustments through the application or interview process.

Our Commitment to Diversity, Equity, and Inclusion

Our mission is to empower every person, regardless of their background or circumstances, with an equitable chance to achieve the careers they deserve. Building a diverse future, one placement at a time.

Apply