Welcome!

A little about me...

I'm A.M. Knight, an Infrastructure and Software Engineer with a passion for building scalable, reliable systems. With extensive experience in both cloud infrastructure and software development, I specialize in creating solutions that are both robust and efficient. My expertise spans multiple domains including:
  • Infrastructure and DevOps practices
  • System architecture and performance optimization
  • Software development and testing
  • Security best practices and compliance
  • Platform building and developer tooling
I enjoy working with cutting-edge technologies to solve complex problems and create impactful solutions for businesses and individual customers. When I'm not coding, you can find me exploring new technologies, homebrewing beer, painting and 3d printing miniature wargaming models and working on a new fantasy novel series.

Education

2007-2009
George Mason University
Bachelor of Arts in Anthropology
2009-2011
Columbia University
Masters of Arts in Anthropology

Experience

2024-Now
Staff Site Reliability Engineer - Environment Automation
Gitlab. Remote

Gitlab

Staff Site Reliability Engineer - Environment Automation

Team lead for the core commercial offering of GitLab Dedicated.

  • Lead development of Hosted Runners to Limited Availability that secured multiple customer multi-year commitments of $500k+ in ARR
  • Developed blueprints for Zero Downtime Deployment and Disaster Recovery
  • Served as part of a technical escalation for critical GitLab Dedicated incidents
  • Key expert in a working group to overhaul SRE hiring across the company
  • Lead analysis and implemented fixes to support the largest GitLab Dedicated tenant’s critical scaling issues
  • Earned 2 discretionary bonuses and an engineering award for best blueprint

2022-2024
Senior Site Reliability Engineer - Environment Automation
Gitlab. Remote

Gitlab

Senior Site Reliability Engineer - Environment Automation

Fully remote SRE working on building the new GitLab dedicated offering.

  • Added aws instance event alerting for our tenants
  • Implemented GitLab Geo for Dedicated tenants as a DR solution

2021-2022
Principal Site Reliability Engineer
Magic Leap. Remote

Magic Leap

Principal Site Reliability Engineer

Fully remote SRE working on platform initiatives and supporting key applications for enterprise customers.

  • Implemented POC Kubernetes based machine learning platform (kubeflow)
  • Coded (Go) a custom Kubernetes operator to create databases and set database permissions for users in Google Cloud SQL
  • Created automation to aggregate NAT IPs for allowlisting across our GCP projects
  • Collaboratively planned go-live for services/onboarding portal for ML2
  • Advised on high level purchases of software and services

2020-2021
Senior Site Reliability Engineer
Blizzard Entertainment. Remote

Blizzard Entertainment

Senior Site Reliability Engineer

Fully remote SRE embedded with the Long Term Analytics (“big data”) team collaborating with game teams and other data teams.

  • Wrote a tool in Go to streamline argo-cd configuration across teams
  • Implemented cert-manager and external-dns for my embedded team
  • Built out a custom Atlantis (terraform automation) instance in GKE to allow reviews of infra changes via PR

2019-2020
Senior Site Reliability Engineer
Magic Leap. Remote

Magic Leap

Senior Site Reliability Engineer

Fully remote position working with a globally distributed SRE team supporting Magic Leap’s platforms and websites.

  • Created a pipeline to manage our GCP projects and user permissions as code using Terraform - managing over 417 projects
  • Created a pipeline to manage our GCP Shared VPC provisioning 53 subnets in different projects for on premises connectivity
  • Maintained 20+ terraform provider forks and hundreds of terraform modules in Go and Terraform
  • A primary architect of a Kubernetes Platform as a Service (PaaS) running internal and major external workloads scaling to accommodate product launches and hundreds of thousands of requests
  • Ran Knative in production as the primary feature of the PaaS providing automatic scaling to 0, istio service mesh/routing and also automatic provisioning of sql databases for services using operator-sdk and CRDs

2018-2019
Site Reliability Engineer
Apple. Remote

Apple

Site Reliability Engineer

Part of an SRE team supporting Apple Maps.

  • Primarily supported an internal tool for managing bare metal servers and a workflow engine both built in Ruby on Rails
  • Monitored site reliability and performance while building monitoring tools to automate and document this work
  • Worked with developers to support new features, releases and consult on architecture
  • Scaled infrastructure and respond to production incidents owning production for the services/sites

2013-2018
Web Engineer II
King Arthur Baking (previously King Arthur Flour). Remote

King Arthur Baking

Web Engineer II

Part of remote team working for internal clients on projects across the company’s ecommerce site and backend systems.

  • Integrated a tax API for all shopping transactions
  • Created an automated deployment system using github to push updates to the static site
  • Built a Disaster Recovery environment in AWS for our data center based servers
  • Modernized tooling and infrastructure
  • Rebuilt from scratch failed production servers transitioning from hand built servers to repeatable ansible playbooks

2011-2013
Programmer
Accenture. Remote

Accenture

Programmer

Developer filling primarily Java roles within the Federal Services division for major government clients.

  • Arrived with 0 knowledge of spring and java by the time I left I had built a prototype front end redesign and was teaching lunch and learns on spring/java best practices

Andy is the kind of person that everyone needs on their team. We spent a great deal of time working together on a variety of critical projects, and I’ve always been amazed by his ability to dive into new technologies and deliver outstanding results. He’s hard working, honest, and fun to be around. I would work with Andy again in a heartbeat.

Anthony Lucillo

Senior SRE

Working with this colleague has been an absolute pleasure. Their dedication, professionalism, and positive attitude make them an invaluable asset to our team. They consistently go above and beyond to ensure that projects are completed on time and to the highest standard. Their ability to collaborate effectively and support their colleagues is truly commendable. I am continually impressed by their innovative thinking and problem-solving skills. This colleague is not just a team member but a true inspiration and a driving force behind our team’s success.

Romain Chalumeau

Principal SRE

Andy has great deep technical knowledge which clearly translates into the reliable and scalable services he builds. Always trying to help others, very friendly and good communication skills. With his excellent Kubernetes skills, any team would be lucky to have Andy onboard.

Filipe Santos

Senior SRE