The following is a summary of my professional achievements during 2015 while working at Velocix.
Velocix (Alcatel-Lucent Video Business Unit)
- I am recognised as a friendly and helpful face of the Tools & Infrastructure team that many engineering customers will approach first as an uncomplicated first contact point to ask for help.
- I am an excellent communicator who uses clear and concise language to leave no room for ambiguity. I have lots of experience of tailoring communication for different business cultures; India, USA and mainland Europe being excellent examples.
- Despite an unavoidable absence due to jury duty, I have delivered a constant stream of projects throughout my first year within the company, requiring the only bare minimum of direction from management.
- During my end of year review I was deemed to have exceeded requirements and expectations in all areas of my job and performance.
Duties
- Management of a mixture of approximately 400 Linux servers, blade servers and assorted network appliances and sundry switching and routing equipment.
- Including physical maintenance, involving racking, powering, patching, installation and decommissioning of hardware.
- Training of new or junior engineers, as well as cross-skilling other engineers within the team.
- Including intensive training of global sourcing partner employees in Chennai, as well as on-going support of their roles.
- Maintenance and support of all engineering development services, some of which included Perforce, Zabbix, Munin, Cacti, Nagios, ReviewBoard, Cognidox, Coverity, Co-Advisor, RackTables, and (to a lesser extent) various in-house and OTS build and test products like Testlink and Buildbot.
- Last line expert (non-product) support for product development engineering scrum teams.
- Peer reviews of code and configuration, and promoting best (and better) practice.
- Proactively removing legacy code and systems architecture to reduce day to day cost of maintenance and BAU tasks.
Achievements
- Implementation Zabbix monitoring.
- Replaced and old and poorly implemented Nagios monitoring system that was already deprecated, but had no logical replacement yet.
- Automated the deployment of Zabbix servers and agents with Ansible
- Automated the integration and configuration of engineering user access to have detailed visibility of services and hardware that they use.
- Resulted in time savings, increased sense of ownership and involvement from teams, pushing accountability to those who are able to resolve issues, reduced costs due to fewer outages and reduced downtimes.
- Recognised subject matter expert. Have identified and fed back multiple bug reports to the software authors.
- Munin build automated by Ansible and rebuilt was incorporated and integrated in to Zabbix rollout.
- Perforce server migration.
- Quantified business benefits and influenced senior management approval to implement new purpose build hardware.
- Planned and executed the migration on my own, with only an additional set of eyes at the final implementation out-of-hours for validation.
- Sizeable cost savings to the company, as well as a much happier development team, and improved productivity due to no more outages.
- Training apprentices and other engineering staff & Global Service Provider employees (Chennai, India).
- Team has become more effective by having skills pushed out to multiple geographical locations with effectively 24 hour cover.
- Redundancy of skills has meant that the team is able to function better during periods of sickness or holiday of staff. Reduction of single points of failure and "tribal knowledge".
- Automation of user account management.
- Has resulted in huge savings in time, and thus money.
- Makes it massively quicker and easier to train staff on process as it doesn't require specialist knowledge (can even be done by apprentices in their first week).
- Bacula backups implemented.
- Achieved buy-in from management to implement comprehensive backups after backups were initially denied as an engineering priority from the Tools & Infrastructure team.
- Mitigated risk by adding restore abilities in to the disaster recovery plan, achieving parity with existing SecureStore backups, plus many more servers and services being covered.
- Solution was significantly faster and cheaper than SecureStore.
- Properly documented so that more engineering people can restore their own files.
- Reporting is visible and understandable whereas old solution was incomprehensible to most.
- Automated deployment with Ansible.
- Simplified and consolidated Ansible automation development to make configuration more flexible and easier to expand with reduced overhead.
- Pushed use of Ansible to encompass many more areas of engineering where it wasn't previously used. For example DNS and monitoring.
- Significant efficiently improvements resulting in cost savings.
- Recognised subject matter expert. Have identified and fed back multiple bug reports to the software authors.
- DNS automation.
- DNS changes are now self service for all engineering teams, helping push an improved sense of ownership and inclusion, plus accountability to those to make changes.
- Automated zone file and config generation with extensive unit tests, before deployment with Ansible.
- DNS is now much safer to update and push to production, so no DNS outages or failures since this was done has reduced disruption to the business.
- Cost savings as a result of no more outages, and quicker turn around of changes.
- Co-ordinated DC ownership + emergency / panic documentation.
- Is now quicker to find the right person or procedure for what to do in an emergency.
- Highlighted numerous SPoFs and critical failings and frailties in core engineering infrastructure. Solutions have been presented to management as a means to move forward, to be driven by myself or others where appropriate.
- Security overhaul buy-in from senior management.
- Raised numerous critical security issues upon joining the company (within the first month).
- Documented and quantified legacy of security shortcomings as well as legal and real-world impact to justify correctional works.
- Co-ordinated with other units within Alcatel-Lucent to get multiple areas of the company to agree upon an initial mitigation plan, and then put in place permanent solutions.
- Raised bash and shell scripting standards within the engineering group.
- Implemented standardised libraries (such as pure bash command line argument parsing and help).
- Corrected and patched various legacy scripts in existing code base, thus reducing technical debt and legacy code shortcomings.
- Trained and educated people through code reviews and individual coaching.