Data Engineer

New York City, NY, United States Full Time Mid-Level
Main Location
New York City, NY, United States
Open jobs

The Center for Data Science and Analytics is the innovative corporate Analytics group within New York Life. We are a rapidly growing entrepreneurial department, which aims to design, create and offer innovative data-driven solutions for many parts of the enterprise. We are aided by New York Life’s existing business with a large market share in individual life insurance. We have the freedom to explore external data sources and new statistical techniques, and are excited about delivering a whole new generation of Analytical solutions.

In fact, we are designing and will build one of the first multivariate model-based continuous risk differentiations in the industry. This model will incorporate current underwriting best practices (including medical rules) as features and add other data sources, patterns/ideas and variables to essentially create a rating plan to support the next generation underwriting process at New York Life. This is just one of several projects with large business value. Geographic analytics on agents and customers, application fraud detection, agent success prediction and client prospecting analytics (off-line and on-line) are other exciting examples of enormous incremental value from analytics. Our products will be implemented into real-time core business processes and decisions that drive the company (e.g. underwriting, pricing, agent recruiting, prospecting, new product development).

We work with data ranging from demographics, credit and geo data to detailed medical data (medical test results, diagnosis, prescriptions) and social media information. We have a modern computing environment with a solid suite of data science/modeling tools and packages, and a large (but manageable) group of well-trained professionals at various levels to support you. Life insurance is on the verge of huge change. This is a chance to be part of, actually to drive, the transformation of an industry.

You will be part of Data & Platform sub-function team under Center for Data Science and Analytics. The Data & Platform team services internally to Data Scientists who focus on Statistical analysis.

You will be part of a fast paced, high-impact team who will work with an entrepreneurial mindset using some of the best of breed tools as part of our Enterprise Data Lake (Hadoop) using R, Spark and Python.

You will apply your data engineering skills to build pipelines, workflows to gather, cleanse, test and curate datasets from Oracle, MSSQL Server, 3rd party data and create datasets in Enterprise Data Lake (Hadoop) which will be used by several teams of predictive modelers.

You will perform Proof of Concepts and test out new software tools under the umbrella of Data Science but geared more towards data engineering.

Responsibilities
  1. Ingests, merges, prepares, tests, documents curated datasets from various novel external and internal datasets for a variety of advanced analytics projects such as Multi-variate model for Risk, Marketing and Compliance
  2. Utilizes data wrangling/data matching/ETL techniques while to explore a variety of data sources, gain data expertise, perform summary analyses and curate datasets
  3. Functions as data expert, contributes to analytics/solutions design and productizing decisions
  4. Can work independently with some supervision and be part of a collaborative team
  5. Work with Project Managers and Scrum Masters to provide milestones and stories
  6. Proactively and effectively communicates in various verbal and written formats with senior level member of the team and partners
Required qualifications
  • Graduate-level degree in computer science, engineering, or relevant experience in the field of Business Intelligence, Data Mining, Database Engineering, Programming
  • 3-5 years of overall experience working in the field of data wrangling and programming with a minimum of 1 year experience with ingesting, cleaning, merging and applying necessary data wrangling logic in Hadoop
  • 1+ years in writing complex SQL queries in any of the following and/or similar databases - Oracle, SQL Server, DB2, MySQL
  • Proficiency using Python for all data related work such as Numpy, Pandas, PySpark
  • Experience working with Linux Operating System
  • Experience working with data visualization tools or packages
  • Experience building Exploratory Data Analysis reports such as Histograms, Box plots, Pareto, Scatter Plot using R, Python or a Data Visualization tool such as Tableau, Spotfire
Preferred:
  • Understanding of statistical modeling concepts, designs and analytics-based products
  • Any experience in using ETL tools such as Ab Initio, Talend, Informatica, Pentaho
  • Any experience working with Data Warehouses and/or Data Marts
  • Any experience in Life Insurance business
Help us maintain the quality of jobs posted on PowerToFly. Let us know if job is closed already.
Mission
We're a community of women leveraging our connections into top companies to help underrepresented women get the roles they've always deserved. Simultaneously, we work to build truly inclusive hiring processes and environments where women can thrive and not just survive.
Are you hiring? Join our platform for diversifiying your team