Data Engineer

  • Understand non-functional requirements and the ability to turn them into Functional/Technical requirements.  Then apply the architectural principles to leverage the Spark Framework to reproduce specific use cases and/or features’ inputs for models.

  • Passion towards automation where appropriate and beneficial

  • Understand Data acquisition, Develop data set processes, knowledge of data concepts

  • Understand the complex transformation logic and translate them to Spark code or Spark-SQL queries

  • Hadoop eco system (Spark, Hive, HBase, YARN and Kafka) , Spark Core, Spark-SQL and live streaming datasets using Kafka/Spark Streaming

  • Unix Shell Scripting and knowledge of Apache Airflow or any Job scheduler

  • Cloud computing technologies like AWS or GCP

  • Cloudera distribution framework, Jenkins (or any version controller)

  • Different file format like AVRO, ORC, Parquet, JSON

  • Excellent understanding of technology life cycles and the concepts and practices required to build big data solutions

  • Data Warehouse and Models Feature mart concepts

  • Core Java with experience in Micro services architecture

  • Ability to understand and build re-usable data assets or features to enable downstream data science models to read and use