at JPMORGAN CHASE & CO. in Wilmington, Delaware, United States
Duties: Responsible for Design, Development, and Implementation of Data Lake solution with petabytes of data for JPMC’s Risk and Finance Organizations. Responsible for building ETL process to copy the data from different source systems on to Hadoop platform to build machine learning models. Build Data Quality controls for data ingested from source to target. Design and set up machine learning platform to develop and train machine learning models. Design and manage Bigdata Hadoop machine learning platform with petabytes of storage and 50k+ vcores for model development and training using Bigdata tools, such as Hive, Impala and Sqoop. Design, implement and support advanced machine learning tools, such as Anaconda Packages, Xgboost, Scipy, numpy, Pandas and Tensorflow, and support model development. Responsible for performance tuning Spark job and implementing best practice in Spark job development. Support machine learning platform with large number of users and set up platform controls and governance process. Set up new tools aligning with firmwide control process. Automate platform monitoring tools using Python and Unix shell scripts, setting up real time alert to users consuming more compute and storage in platform, archive and clean up unused tables past retention period, and detect unauthorized package installation on the platform. Design and develop machine learning fraud models using graph databases, such as Tigergraph. Set up and support Tigergraph cluster to support model development. Utilize AWS - EC2, S3, EMR and SageMaker to migrate the machine learning model from Hadoop to AWS.
Minimum education and experience required: Bachelor’s degree or equivalent in Information Technology, or related field, plus 7 years of related experience in application development, or related experience; OR Master’s degree or equivalent in Information Technology, or related field, plus 5 years of related experience in application development, or related experience.
Skills Required: Experience in designing and developing application using Hive, Impala and Spark. Experience performance tuning of jobs running on distributed database architecture. Experience in application development in Cloudera Bigdata platform and Hadoop tools. Experience in design, coding and implementation of Data Lake and Data Load activities. Experience in platform monitoring and build tools to automate platform monitoring. Experience in development and support of applications using NoSQL database. Experience in automation of platform monitoring tools using Python and Unix shell scripts, setting up real time alert to users consuming more compute and storage in platform, archive and cleanup of unused tables past retention period. Experience managing small to medium teams to complete deliverables. Experience in managing datalake and datawarehouse solution to bring the data from different source systems using ETL technologies. Experience in designing application with distributed database in handling larger dataset. Experience in building Data Quality controls for the data ingested from source to target systems. Employer will accept any amount of professional experience with the required skills.
To apply for this position, visit our website at https://careers.jpmorgan.com/us/en/professionals and apply to job number 210179513. JPMorgan Chase is an Equal Opportunity and Affirmative Action Employer, M/F/D/V.