Infrastructure, Production, DevOps, Site Reliability, and Cloud Security Engineer
Overview #
Versatile engineer with over 15 years of relevant experience optimizing secure, scalable, and cloud-native infrastructure. Expert in coding for software and infrastructure deployment using modern AI tooling, on-the-fly incident troubleshooting across network, system, and application layers, and mentoring team members to drive operational excellence. Proven success automating deployments for FedRAMP and PCI-DSS compliant environments, reducing mean time to resolution (MTTR), and fostering cross-functional collaboration to deliver robust systems.
Key Skills #
- Cloud and Infrastructure: AWS, GCP, Kubernetes, Terraform, Helm, Ollama & self-hosted LLMs
- CI/CD and Automation: Jenkins, GitLab CI, GitHub Actions, Argo, flux
- Monitoring and Observability: Prometheus, Thanos, AlertManager, Grafana, Splunk, Loki, Elasticsearch
- Security and Compliance: PCI-DSS, FedRAMP, WireGuard, IPsec, OpenVPN
- Programming: Claude Code, OpenAI Codex, Go, Python, Ruby, Bash, Git
- Core Competencies: AI, Incident Response, Scalability, Cloud-Native Architecture, Cross-Functional Collaboration, Mentorship
Experience #
2026 / Block #
Cloud Security Engineer / Frisco, TX (remote)
- Designed and delivered an enriched error reporting pipeline for a network ACL deployment system used for Square, CashApp, etc, propagating failure context across AWS Lambda functions via S3 and surfacing actionable alerts in Slack to reduce mean time to detect (MTTD).
- Improved reliability of a proxyless security pipeline by implementing retry logic, hardening automated CA bundle and system registry API calls against transient failures.
- Ramped up on a complex security platform quickly, shipping production-ready code across AWS Lambda, S3, Cloudflare One, and Fastly CDN/NGWAF services while completing all new-engineer onboarding.
- Departed due to a recent layoff of half of the company.
2024 ~ 2025 / Cisco Systems #
Site Reliability Engineer / Frisco, TX (remote)
- Architected service re-implementation in Kubernetes with Argo and Helm.
- Achieved 99.99% uptime for FedRAMP-compliant environments at Moderate and High Impact Levels.
- Streamlined deployment pipelines for over 150 component services using GitHub, Kubernetes, and Argo, reducing service onboarding time by more than 50%.
2022 ~ 2024 / Schmoll Systems LLC #
President / Frisco, TX (remote)
- Developed Go-based cloud resource management software, automating infrastructure provisioning for multiple clients across AWS and GCP.
- Reduced client infrastructure costs over 30% through server consolidation and improved autoscaling configurations.
- Mentored client teams in Kubernetes and Terraform, improving their operational effectiveness.
- Led incident response for critical outages, resolving issues within SLAs 98% of the time.
2020 ~ 2022 / Salesforce.com (MuleSoft) #
Site Reliability Engineer / Santa Fe, NM (remote)
- Enhanced stability of FedRAMP Moderate Impact Level (IL) GovCloud environments, achieving 99.9% uptime and uplifting to High IL.
- Automated incident remediation workflows, reducing manual interventions by more than 40%.
- Collaborated with development teams to implement cloud-native monitoring with Prometheus and Grafana, improving availability of common Service Level Indicators (SLIs) and establishing useful Service Level Objectives (SLOs).
- Mentored 3 junior engineers in advanced troubleshooting techniques, fostering a culture of proactive incident management.
2018 ~ 2020 / Subsplash #
Site Reliability Engineer / Santa Fe, NM (remote)
- Migrated 20+ Go-based microservices from AWS EC2 instances to AWS EKS, reducing deployment time by 50% and standardized deployments using Terraform, GitLab CI, and Helm.
- Ensured PCI-DSS compliance for payment card processing systems, passing all audits with zero outstanding findings.
- Oversaw and implemented infrastructure consolidation from 3 distinct acquisitions, unifying networking and systems, and scaling infrastructure to handle 200% user growth.
- Trained 10+ developers in Kubernetes best practices, enabling daily production deployments.
2013 ~ 2018 / Salesforce.com (Pardot) #
Site Reliability Engineer / Seattle, WA (remote)
- Automated infrastructure deployments with Chef and Terraform, supporting 10+ daily application code deployments in a dynamic environment of more than 50 developers.
- Ensured Salesforce Trust compliance, reducing security vulnerabilities by 25% through proactive monitoring with standard tooling.
- Led cross-functional teams to optimize system performance, improving application response times by more than 50%.
- Mentored junior SREs and built a scalable incident response framework.
Earlier Career #
- ServiceNow, Performance Engineer (2012 ~ 2013): Optimized MySQL database performance for 1000+ instances, improving query response times by more than 25%.
- SAP Concur, Unix Systems Engineer (2007 ~ 2011): Managed 100+ Red Hat Linux systems and supported a Hadoop cluster for data mining; assisted with over 1300 servers across multiple sites.
- Breakwater Security Associates, Network Engineer (2005 ~ 2006): Responded to network and system outages under strict Service Level Agreements; assisted clients with system modifications and updates.
Education #
2005 ~ 2008 / University of Washington #
Bothell, WA
- Attained Bachelors of Science in Computing & Software Systems