Infrastructure, Production, DevOps, and Site Reliability Engineer
Overview #
Versatile engineer with over 15 years of relevant experience optimizing secure, scalable, and cloud-native infrastructure. Expert in coding for software and infrastructure deployment, on-the-fly incident troubleshooting across network, system, and application layers, and mentoring team members to drive operational excellence. Proven success in automating deployments for FedRAMP and PCI-DSS compliant environments, reducing mean time to resolution, and fostering cross-functional collaboration to deliver robust systems.
Seeking a role as an Infrastructure, Production, DevOps, or Site Reliability Engineer to deliver highly-available products and services to customers.
Key Skills #
- Cloud and Infrastructure: AWS, GCP, Kubernetes, Terraform, Helm
- CI/CD and Automation: Jenkins, GitLab CI, GitHub Actions, Argo, flux
- Monitoring and Observability: Prometheus, Thanos, AlertManager, Grafana, Splunk, Loki, Elasticsearch
- Security and Compliance: PCI-DSS, FedRAMP, WireGuard, IPsec, OpenVPN
- Programming: Go, Python, Ruby, Bash, Git
- Core Competencies: Incident Response, Scalability, Cloud-Native Architecture, Cross-Functional Collaboration, Mentorship
Experience #
2024 ~ present / Cisco Systems #
Site Reliability Engineer / Frisco, TX (remote)
- Architect service re-implementation in Kubernetes with Argo and Helm.
- Achieve 99.99% uptime for FedRAMP-compliant environments at Moderate and High Impact Levels.
- Secure Provisional Authorization to Operate (P-ATO) for new environments, enabling $2-3M in federal contracts.
- Streamline deployment pipelines for over 150 component services using GitHub, Kubernetes, and Argo, reducing service onboarding time by more than 50%.
2022 ~ 2024 / Schmoll Systems LLC #
President / Frisco, TX (remote)
- Developed Go-based cloud resource management software, automating infrastructure provisioning for multiple clients across AWS and GCP.
- Reduced client infrastructure costs over 30% through server consolidation and improved autoscaling configurations.
- Mentored client teams in Kubernetes and Terraform, improving their operational effectiveness.
- Led incident response for critical outages, resolving issues within SLAs 98% of the time.
2020 ~ 2022 / Salesforce.com (MuleSoft) #
Site Reliability Engineer / Santa Fe, NM (remote)
- Enhanced stability of FedRAMP Moderate Impact Level (IL) GovCloud environments, achieving 99.9% uptime and uplifting to High IL.
- Automated incident remediation workflows, reducing manual interventions by more than 40%.
- Collaborated with development teams to implement cloud-native monitoring with Prometheus and Grafana, improving availability of common Service Level Indicators (SLIs) and establishing useful Service Level Objectives (SLOs).
- Mentored 3 junior engineers in advanced troubleshooting techniques, fostering a culture of proactive incident management.
2018 ~ 2020 / Subsplash #
Site Reliability Engineer / Santa Fe, NM (remote)
- Migrated 20+ Go-based microservices from AWS EC2 instances to AWS EKS, reducing deployment time by 50% and standardized deployments using Terraform, GitLab CI, and Helm.
- Ensured PCI-DSS compliance for payment card processing systems, passing all audits with zero outstanding findings.
- Oversaw and implemented infrastructure consolidation from 3 distinct acquisitions, unifying networking and systems, and scaling infrastructure to handle 200% user growth.
- Trained 10+ developers in Kubernetes best practices, enabling daily production deployments.
2013 ~ 2018 / Salesforce.com (Pardot) #
Site Reliability Engineer / Seattle, WA (remote)
- Automated infrastructure deployments with Chef and Terraform, supporting 10+ daily application code deployments in a dynamic environment of more than 50 developers.
- Ensured Salesforce Trust compliance, reducing security vulnerabilities by 25% through proactive monitoring with standard tooling.
- Led cross-functional teams to optimize system performance, improving application response times by more than 50%.
- Mentored junior SREs and built a scalable incident response framework.
Earlier Career #
- ServiceNow, Performance Engineer (2012 ~ 2013): Optimized MySQL database performance for 1000+ instances, improving query response times by more than 25%.
- SAP Concur, Unix Systems Engineer (2007 ~ 2011): Managed 100+ Red Hat Linux systems and supported a Hadoop cluster for data mining; assisted with over 1300 servers across multiple sites.
- Breakwater Security Associates, Network Engineer (2005 ~ 2006): Responded to network and system outages under strict Service Level Agreements; assisted clients with system modifications and updates.
Education #
2005 ~ 2008 / University of Washington #
Bothell, WA
- Attained Bachelors of Science in Computing & Software Systems