Avatar

Head of Data and Engineering
Amsterdam Area, Netherlands

Email: [email protected]
Linkedin: https://www.linkedin.com/in/andersonvaz/
Github: https://github.com/avaz

Hands-on Software Engineer with 17+ years of experience out of 8+ years in management leading teams 20+ software and machine learning engineers. Recently, I’ve been focusing on data engineering for machine learning projects, specifically in the area of Data Architectures with focus on MLOps and machine learning productization. I hold a major in Computer Sciences and a Master in Business Administration.

Skills

Behavioral Technical

👨🏽‍💻Very hands-on professional, capable of designing and implementing software solutions applying the most variety of standards, architectures, SAAS and technologies mainly with Java, Python and Javascript focusing in distributed systems and data engineering for machine learning and model operations and customization.

👨🏻‍🏫 Lead and mentor a team of software engineers, fostering a collaborative and innovative work environment. Provide technical guidance and expertise to the team, keeping up to date with the latest industry trends and best practices.

👀 Oversee the design, development, and maintenance of software solutions, with a strong focus on Java, Python, Javascript and Machine Learning.

📈 Manage software development projects from concept to deployment, ensuring timely delivery and quality. Effectively manage resources, and timelines to meet project objectives.

👩🏻‍🤝‍👩🏼🧑‍🤝‍🧑👩🏽‍🤝‍👨🏾Can easily get along together with cross-functional teams of a diverse set of skills like software and machine learning engineers, data scientists, product managers, executive stakeholders, to align efforts with company goals.

🗣️ Strong communication skills in written and verbal form being able to deal with internal and external stakeholders to provide updates, gather requirements, and address any technical concerns.

🧠 Entrepreneurship mindset, proven track of making business thrive through software with solid problem solving skills.

🚅 Defining and driving the organization’s technology strategy and roadmap.

🧑🏽‍🎓 Hold a Major in Computer Science, a Master’s in Business Administration, and a comprehensive set of industry technology certifications and specializations.

🫶🏽 Lover and in favor of old but not even a bit afraid in try the new. Technologies and paradigms are enjoyable things to deal with.

📚 Values good practices, design patterns, and clean code. Solid knowledge in a wide set of programming languages, frameworks, tools, and technologies in general.

♻️ Contributor and stronger supporter of open source software.

👾 Skilled video-game player, CoD mobile is the one most played at the moment.

🫐 Açaí addicted, best fruit in the world.

Java: Core, JPA/Hibernate/QueryDSL, JDBC, Bean Validation, NIO, Streams, Executor Service/Concurrency. I’m well-versed in the Spring framework and its various modules, such as Spring Boot, MVC, Security, Kafka, Camel, Data, Cloud, Rest Docs and others. Other relevant libraries/frameworks Flyway, Jackson/Custom De/Serializers/View Groups, Gradle, and Maven.

Python: I have focused on Python for the last 6 years. I work with libraries such as pathlib, os, pandas, numpy, scikit-learn, scipy, PySpark, Keras, PyTorch, Pydantic, Spotify Luigi, BeautifulSoup4, SQLAlchemy, FastAPI, setuptools, and poetry. I also have experience with Pyenv/Virtualenv and (mini)conda.

Front End: JavaScript, Node.js/npm, Vue.js v2/3, Nuxt.js v3. Additionally, HTML, CSS/Tailwind CSS, Storyblok CMS, and Strapi CMS. I have a good understanding of the Node.js and JavaScript ecosystem and can do front-end development.

Architectural Styles: I have experience with microservices, event-driven, and serverless architectures. Of course, I have created monolith applications and have experience with the strangler pattern.

Integration: Apache Camel, CXF, web/HTTP/ReST APIs.

Data Engineering: Batch, streaming, data pipelines, and data lakes, Databricks lakehouse (medallion pattern), data warehouses and data modeling.

Business Intelligence: Databricks Visualization support, Tableu, Hex, Evidence, Apache Superset.

Storage/Databases/Cache: I have solid knowledge of SQL, which helps me when working with various persistence technologies. I’ve used MySQL, PostgreSQL, SQLite, MongoDB, Hive, Pig, Hadoop, Memcache, Redis, DuckDB, and vector databases such as Milvus.

Analytics: I have an understanding of MapReduce, CDC concepts, ETL, data ingestion, cleaning, wrangling, transformation, analytics, and reporting/BI. I’ve worked with Apache Kafka/KSQLDB, (py)Spark, and Delta Lake. Additionally, I’m familiar with statistical analysis techniques.

DevOps/IaC: I have experience with GitHub Actions, Jenkins, AWS CDK/CloudFormation, Terraform, Docker, Docker Compose, and Kubernetes.

MLOps: Custom MLOps pipelines, Iterative DVC/TPI, MLflow, Ray.io, AWS SageMaker, and Databricks MLflow.

Infrastructure/Cloud: I’m most experienced with AWS and have run several production workloads with autoscale support, following the Well-Architected Framework, also, I know managed services like AWS Glue, Redshift, AppFlow, Lambda, SNS, SQS, S3, ECS, EC2(GPU), Aurora/RDS, CodeCommit/Deploy. I know very well Google Cloud and services like BigQuery, Cloud Storage, Cloud Spanner, and Cloud Functions, Monitoring, Compute. On Azure I have basic knowledge of services like Azure Functions, Azure Storage, Azure SQL, Azure DevOps, and Azure Kubernetes Service. I have knowledge with monitoring and logging tools like AWS CloudWatch, Google Cloud Monitoring, Grafana, and Prometheus.

Recent projects:

  • Quality Estimation API (https://api.taus.net), an application that accurately estimates the quality of machine translation output. The API is built using Python, FastAPI, Pydantic, PySpark, and PyTorch. The API is deployed on AWS EC2 GPU/cuda instances with Spring Cloud Gateway and Memached for caching. The API is deployed using Docker and GitHub Actions. Also, a full MLOps pipeline is in place for training and deploying new models, using PyTorch, Iterative/DVC/TPI, AWS S3/Lambdas. — TAUS 2023

  • Revamped the TAUS website (https://taus.net), replaced Joomla with using Nuxt.js v3, Tailwind CSS, and Storyblok CMS. Joomla is great and a nice CMS but we had just one person in the company who knew how to work with it, with the revamp we have a much more flexible and modern website that is straightforward to maintain and update by anyone in the company. The website is deployed on AWS ECS using Docker and GitHub Actions. This application also integrates with other microservices for authentication and authorization and user management. — TAUS 2022

  • TAUS Data Marketplace (https://datamarketplace.taus.net), a marketplace for language data. The marketplace is built using Java 17, Spring Boot/MVC/Security/Kafka/Data, JPA/Hibernate, QueryDSL, Jackson, Flyway, Gradle, AWS RDS MySQL. The marketplace is deployed on AWS ECS using Docker and GitHub Actions with OpenId Connect authorization. — TAUS 2022

  • A serverless Data architecture for data processing and analytics. The architecture is built using Spring Camel, Apache Camel, QueryDSL/SQL, AWS Lambda/S3/Aurora/SNS/SQS/ECS, to AWS ECS using GitHub Actions. The data processing starts by uploading a file to S3 which triggers a lambda function that checks if there is enough AWS ECS Tasks available (rate limiting) to process the file, if not it triggers a new task otherwise forwards the message to a SQS queue which later will be consumed by the task. The task once completed sends a notifications to SNS where other components (ECS Tasks) will consume the message and process the data. Errors are sent to a SQS DLQ for further investigation. — TAUS 2021

  • A machine learning pipeline to train and inference a model to predict the amount of baking articles that should be baked for one of Europe’s biggest retail supermarket companies. The pipeline is built using Python, PySpark, Numpy, Pandas, Jinja, Apache Hadoop/pySpark, Hive and Spotify Luigi. — Teradata/LiDL 2019/2020

Education

Student Professor/Instructor

Master of Business Administration (MBA)
Fundação Getúlio Vargas São Paulo
2010 - 2011

https://educacao-executiva.fgv.br/

Graduate Studies Professor
Senac São Paulo
2010 - 2011

http://www.sp.senac.br

Bachelor’s in Computer Sciences
Pontifícia Universidade Católica de São Paulo
1999 - 2004
https://www.pucsp.br/graduacao/ciencia-da-computacao

Java and Middleware Certified Instructor
Oracle Brazil
2008 - Present*
https://education.oracle.com/java/java/pFamily_48
* I’m living in the Netherlands, back in Brazil I’m still an Oracle instructor

Java/SAP ABAP Certified Instructor
SAP
2004 - 2008
https://www.sap.com

Specializations

Certifications/Accreditations Courses

AWS Certified Cloud Practitioner
https://www.credly.com/badges/6e80c9f1-6eaa-4ac2-b4c1-6b070b6207c7

Get Started with Machine Learning on Databricks
Get Started with Data Engineering on Databricks
Data Workloads with Repos and Workflows on Databricks
Data Analytics and Warehousing with Databricks SQL
Data Storage and Management with Delta Lake
Data Governance with Unity Catalog
Data Pipelines with Delta Live Tables
Generative AI Fundamentals
Streaming with Structured Streaming and Delta Live Tables

Databricks Academy
https://customer-academy.databricks.com

Databricks - Academy Accreditation - Generative AI Fundamentals
https://credentials.databricks.com/adb7bbb9-985a-4f99-950b-3814dd9adf2c

Introduction to Data Engineering
https://www.coursera.org/account/accomplishments/verify/ZQ2D56H7BFPF

Databricks - Academy Accreditation - Lakehouse Fundamentals
https://credentials.databricks.com/a8fb7f4f-544c-450a-b7ae-94b61f1663e9

Teradata - Intro to Vantage
https://www.credly.com/badges/68d53e4f-5993-4ddc-88d9-3e29d029dfca/linked_in_profile

Confluent Fundamentals Accreditation
https://www.credential.net/ea52a54e-2989-4a91-bcbe-2d1e0d64d144#gs.4q45ly

Hadoop Operations: Hadoop Administration Foundation - Hortonworks/Teradata, 2018

Certified Web Components for Java Platform
Certified Programmer for Java Platform
Certified Associate for Java Platform

Oracle (former Sun) University, 2006

MongoDB Aggregation Framework (M121)
MongoDB for DBAs (M102)
MongoDB for Java Developers (M101J)
Mongo University, 2016

SAP Certified Instructor for SAP Technologies
SAP Netweaver Workbench for ABAP +
SAP University and Instructors Program, 2008

COBIT - Auditing IT Systems
Galegale & Associates, 2011

ReST from Scratch (by Guilherme Silveira)
Continuous Delivery (by Martin Fowler and Jez Humble)

QCon Conference Workshops, 2010

Web Application Security Training
Bonsai Information Security, 2010

Introduction to TCP/IP Architecture
Telefonica Brazil, 2006