Dominic Wellington | Moogsoft | TechWeek18

Avatar Ben Fower | 21/12/2018

Dominic Wellington from Moogsoft joined us at TechWeek18 to discuss AI, IT Operations and Data. Hosted by Mat Jordan

Mat: Our next guest Dominic Wellington from Moogsoft good to meet you.

Dominic: Thanks.

Mat: Thank you for being here tell us about Moogsoft.

Dominic: So, it looks after a company in the AI Ops space which is AI applied to IT operations. So, AI is super buzz wordy and the big challenge we always have is explaining exactly what it does concretely.  This is your opportunity. So, the idea is that IT operations have a complexity problem, infrastructures have got more complex, developers are expected to make changes faster and faster, automation is happening behind the scenes, it is making changes on its own. So, there is just too much happening all the time and it is not possible for either humans or rules based on deterministic systems to keep up with it all.  So, all of this is turning into a tax on organizations agility ability to get things done and that is where the AI part comes in. We use AI and machine learning techniques to make sense of all that complexity to filter it out and just show the busy expert humans what they need to know about right now and hide all of the complexity away from them.

Mat: So, yours is effectively giving an interface software layer that sits in and interfaces into the database.  Into everything that people already have today. So, all of those consoles that people have today that they layered across the video wall at the head of the knock.

Dominic: Yeah, instead of having humans staring at those things waiting for something to generate you have the humans doing that day job actually trying to provide value to the company and when something happens that needs our attention the system will come to them and say hey something happening in your patch on the services that you care about. There are some other colleagues of yours that are also being hit by it. Let us get you all into a chat room and you can look at all of the data and make a determination about what to do about it.

Mat: Okay. So, I am starting to get it now.

Dominic: So specifically, then it is about collating all the data that the machines and your IT infrastructures are sending out and it is providing that easy to see GUI interface so that rather than you having to constantly monitor it is actually turning it into a reactive thing rather than you constantly being proactive. So, when their problem occurs, it senses a change in something that is outside of tolerances it then fires a message and alerts you to that specific event rather than you having to watch an entire raft of events. That is basically it, the last thing that we want to see happen in IT Ops is that the first thing we know about a problem is when you have an angry user at the end of a phone line saying my thing is broken. I cannot do my job, by that point it is too late, we are already behind the ball. What we want to do is be able to catch those faint early warning signals of something beginning to go wrong that would otherwise be lost in the noise, but if we can pick them out, we can show them to the right people. They cannot prevent the problem but prevent it from getting to the point that it is affecting somebody’s job of their ability to get something done.

Mat: Okay, so give me an example are we talking sort of what temperature changes within a data centre room or what sort of things could it be.

Dominic: It could be something physical like that. We have a Telco customer and one of their key proof points that they used as part of their decision process to procure the software was that we were able to identify there is a whole bunch of really low severity low priority alerts about dust filters filling up in some equipment out in the field somewhere.  Taken one by one were not significant but taken altogether correlated and looked at in context were indicative of an entire data centre that was about to go down. So, some construction and the road dust was blowing in, it was getting caught in the dust filters and the dust filters were doing their job, but once the filter fills up the machinery is going to overheat. One dust filter fills up, one device goes down. You have got redundancy you can deal with that. So the alert has a low priority, but if you have a bunch of them all happening at the same time in the same location within a short space of time you have a big problem, but on the other hand if you can catch that early you can send the guy around with a can of compressed air which is a really cheap fix but you have avoided yourself a major outage and all of the operations pain and payouts that go with that.

Mat: Okay, so effectively it monitors anything within a data centre, is what we are saying, any or all structure data.

Dominic: Okay, a user transaction within an application network, access time, resource utilisation, hardware information as we mentioned, to us they are just events.  With a retail customer in the US, we did all of their websites from top to bottom. The hardware that had ran on the software that was supposed to support the application itself, wait times to users, etc. and then I got thinking that we had to do one more piece of data, which was a Twitter feed. So just by looking at frequency of mentions, nothing too clever, but correlated with everything else that tells us do we have a problem that is only visible internally or do you have a problem that is affecting users and they are complaining about it because let’s face it the sudden spike it mentions is probably a complaint, not a compliment.

Mat: Yeah, it is all about putting it in context.

Dominic: Okay, and so actually then if you are looking to sort of knit this into your infrastructure, how long does that typically take or is it ‘how long is a piece of string’, but typically is it complex to integrate this sort of thing in or is it straightforward. So, the reason it is not easy to give a straightforward answer is it is not just a technology problem. So, by definition here we are spanning the boundaries of technology and people and processes and that is a three legged stool and the technology is always the easiest like to deal with, people and processes are much harder. So it depends on how the organisation is structured already, whats its tolerance for changes, whether the users are willing to adopt something or whether they see that as a threat. There are a whole lot of factors going into whether it is going to be a few weeks adoption or an 18 month adoption.

Mat: Okay. So you have done both basically.

Dominic: Yeah, the good thing though is that it is a staged process. So by its nature there are certain features that will work more or less out of the box, and so you start getting benefits right away and the software will pay for itself just on that but then the idea is that was all free up time and resources and attention band with that you can invest in getting access to the higher levels of capability and those will, in turn, free up more resources so that you can then start thinking about okay today we manage our processes this way and maybe we need to think about a different way of doing it. We always talk about operations and this is still very much in the waterfall mode. The development moved away from at least a decade ago is still very sequential very one person is doing one thing they are the owner and then they pass the baton on to someone else. Who then does their own one thing and it is all one thing after another. Dev no longer works like this the whole DevOps agile revolution was about collaboration about people doing a whole bunch of different things at a time. Breaking the big complex task down into small and manageable bites and bringing that same way of working to the Ops World getting people into Chat Ops into collaboration into multi-owner processes.

Mat: Yes, I guess yeah, the summary as I sort of understand. Effectively once it is laid and is integrated it kind of pulls all of that need-to-know information in presenting in a manageable way to pull people together to enable people as you say to collaborate around the fix. That if there is a change in that is not within the parameters we would expect to see you know, what is it and it reaches out to the right people notifying them to sort of then get them all talking and working towards a fix for your data centre World.

Dominic: Exactly, so if you get some alerts, you know, it is a real thing. It really is for you and you need to jump on it right away as opposed to a classic situation which is known as the sea of red, right, if every alert is red on the whole screen and is just red top to bottom, that is almost as if you did not have the screen at all, but it is not telling you anything useful at that point you managed by whoever screaming loudest right now.

Mat: Yeah, we have all been there. So if people want to find out more information about the Moogsoft software platform and whether it is suitable for their environment, where should they go?

Dominic: the websites would be the obvious place to start. We are also on all the social media platforms, if they are cloud users, we also have Partnerships with the big cloud platforms. So, we are a launch partner for Amazon’s cloud management competency. We are in the marketplace. The same thing with Azure, so you can also get access to an evaluation system directly at those platforms.

  • © 2019 - 2020 COMPARE THE CLOUD LTD. All rights reserved.