KARYA CONSULTANTS PRIVATE LIMITED
ML Data Engineer - Pandas/Numpy
Job Location
Bangalore, India
Job Description
Responsibilities : - Designing, developing, and executing data pipelines to ingest, preprocess, and transform data for Generative AI model training and inference. - Proficiency in data manipulation and preprocessing using tools like NumPy, Pandas, or SQL. - Familiarity with big data technologies such as Hadoop and Spark for processing and analyzing large-scale datasets. - Designing and implementing data pipelines for Generative AI projects by utilizing various technologies including Vector DB, Graph DB, Airflow, Spark, PySpark, Python, LangChain, LlamaIndex, Open AI functions, AWS Functions, Redshift, and SSIS. - This involves integrating these tools logically and efficiently to create seamless, high-performance data flows supporting the data requirements of our AI initiatives. - Collaborating with data scientists, AI researchers, and other stakeholders to understand data requirements and translate them into effective data engineering solutions. - Demonstrating familiarity with data integration services like AWS Glue and Azure Data Factory, effectively utilizing these platforms for seamless data ingestion, transformation, and orchestration across various sources and destinations. - Proficiency in constructing data warehouses and data lakes, organizing and consolidating large volumes of structured and unstructured data for efficient storage, retrieval, and analysis. - Implementing data security and governance policies to ensure the privacy and integrity of sensitive data used in Generative AI projects. - Monitoring and optimizing data pipelines for performance, scalability, and cost-effectiveness. - Staying updated on the latest advancements in data engineering tools and technologies (e.g. Apache Spark, Airflow, Snowflake, Data Bricks) and applying them to our Generative AI platform. - Effectively communicating with technical and non-technical stakeholders about data quality and availability for Generative AI projects. Qualifications : Minimum Qualifications : - Bachelor's degree in computer science, Data Science, Statistics, or a related field, or equivalent experience. - Experience in data engineering or related roles such as data pipeline development, data storage, or ETL/ELT processes. - Proven experience in building and maintaining data pipelines for machine learning projects. - Strong understanding of data modeling principles, data quality measures, and data security best practices. - Proficiency in programming languages like Python, SQL, and scripting languages (e.g. Bash, Shell). - Familiarity with cloud platforms (e.g. AWS, Azure) for data storage and processing. - Excellent communication, collaboration, and problem-solving skills. - Ability to work independently and as part of a team (ref:hirist.tech)
Location: Bangalore, IN
Posted Date: 5/2/2024
Location: Bangalore, IN
Posted Date: 5/2/2024
Contact Information
Contact | Human Resources KARYA CONSULTANTS PRIVATE LIMITED |
---|