A lot of people are lucky to see the inner workings of a single tech giant or Fortune 100 company. Red Hat has the pleasure of working with basically all of them. And I think that’s what I love most about being a consultant and architect at Red Hat — I get to speak to, and work with, so many different people and groups from all of the biggest companies all over the world.
I think it’s endlessly fascinating to hear about what everyone else out there is doing. And nowadays, I talk to all sorts of people about network automaton. This is the definition of a dream job!
After spending years building out massive networking automation projects with tens of thousands, and hundreds of thousands of devices, device management is easier than ever. Ansible has evolved staggeringly quick, and in a lot of ways, the actual device configuration is a problem that’s quickly being solved.
More than ever, the question I most often get is simply how do I get started doing a network automation project? Nowadays, the biggest hurdle is often just getting your head wrapped around the options, and ideas, and ways, about how to do things one way or another.
How do you begin even thinking about going from a small lab with a dozen devices…to some gigantic production network with thousands upon thousands of devices that you’re supposed to just…”manage and automate?”
From my perspective as an infrastructure architect, this is one of the best ways that I’ve found for people to begin managing their network in a practical way:
1. You should start your automation journey by gathering facts from everything on your network.
For example, this is my network fact role that I have been using for years. Anytime I come across a new network OS somewhere, I add it to the list:
https://github.com/harrytruman/facts-machine
Anyway, this role will give you parsed configs for everything Cisco/Arista/JunOS, and the raw config (show run all
) for every other device it encounters:
Here’s more about how/why I do fact collection first:
https://www.landoman.com/2020/02/07/automating-networks-with-ansible-part-1/
2. Once you have fact collection running, then you’re ready to begin Ansible state/config management!
Building playbooks has never been quicker and easier. As of 2.9, Ansible’s network resource modules will let us do state management rather than just config mgmt. Identify variables, built templates, and the modules do the rest. This is the easiest way to build and implement backups/restores too.
The resource modules will determine which commands need to be sent and in which order, whether things need to be removed first, etc… The things that once took a year can be done in days or weeks now…
https://www.landoman.com/2020/02/25/managing-network-interface-states/
3. Next, use your network facts to build or enhance your CMDB!
Establishing a CMDB is the prerequisite to doing anything with Ansible long-term. With Tower as the API/UI around Ansible, I prefer pairing it with Elasticsearch [ELK] stacks to create a full-tilt CMDB and search engine combo.
This is all done through Ansible Facts and Tower logging — nothing else required. I gather facts against everything on the network, and I use playbooks to search Elasticsearch for that data from whatever time I’m interested in, so I can compare/diff or retrieve specific configs to be used as backups/restores.
Keep in mind that Tower itself is often not the best place to be doing heavy searching and log/job analysis. In general, we recommend you offload search and analytics to an external service. And at large scale — and certainly at high volume — facts and logging are the gateway to a big data project.
https://github.com/harrytruman/ansible-tower
https://github.com/harrytruman/elk-ansible
4. And now that we have all of these basic functions in place, it’s time to begin scale and performance testing. Part of this involves setting up a development and testing framework. The rest of it is purely and exercise in establishing standards that allow people to efficiently learn how to work with and create content using Ansible and Git.
Everything I’ve covered so far can be stood up and configured with basic functionality rather quickly. And at the very least, these specific tools and technologies will all scale with us as quickly as we can develop things to use them.
https://www.landoman.com/2020/02/10/scaling-ansible-and-awx-tower-with-network-and-cloud-inventories/
This all takes time to build out to full scale in a large network, but it’s a tried and true, and a practical way, to begin your network automation adoption.
This framework — and the fundamental objective of knowing what’s running on your network at any given time — has been implemented with tremendous success in every network infrastructure I’ve worked on.
The day one results are immediate, and the foundation for all of this can be built in the time it takes to do a POC. The fact collection and logging that we’re doing through Tower and ELK both lend themselves well to a quick implementation and gradual scale-up to running against massive inventories.