From Chaos to Clarity: How This SDE Manager Revolutionized Analytics with Continuous Improvement
Loading Video...
Preparing the interview
Complete interview transcript & analysis below
Enhanced transcript with interviewer insights
INTERVIEWER
All right, next question. Uh, yeah, so for this one, instead of, you know, walking me through something complex, the last, the last question that was clearly something that was complex, for this one, I'd like you to pick a problem that you worked on over time, but your focus was on continuous improvement, right? You started with something and you were just iterating, making it better, better, better. So what is the most significant continuous improvement project that you have led? It was the project, why was it important?
CANDIDATE
True. Just give me a minute, you know, a second, yeah. Um, Yeah, yeah, I, I think this is a pretty good, I got a very good example here. So this also happened in my cafe, uh, our, as I, as I said in the right in the beginning, uh, our team's goal was to, uh, find out all the, uh, as we were, we were uniting all these four, Baymouth applications. The idea was also to find out what are the commonalities in all these applications and make libraries out of it. So during this process we found out that all these applications have, uh, analytics modules that is they all of them. And do the collect the ana uh data for analytics and keep sending it to the different uh um back ends. So here what we did is we tried here we, uh, started creating a analytics SDK analytics library which would do all these jobs. So as we started thinking about it, we got a lot of things that could be done on the client side for, uh, making this analytics data correction, uh, collection, uh, uh, very good, uh, making it a base. Basically, a, a, a much more better, uh, way of doing the analytics data collection. So first thing that we did is we just, uh, had a simple thing. We would just, uh, get all the data into this, uh, through the, through this library, exposed some of the API, and then from this API we would just directly send it to the server. That was our first, uh, approach. And as we started thinking, we thought that, uh, instead of just sending the data directly dumbling, uh, we could make sure that that right, uh. Directly at the client level we could increase the quality of data that goes onto the server. That is, if there are any events which were not intended to be sent, then uh we could define the configurations and in the configurations the product people could define what are the events which are supposed to go. So based on that we started dropping events which were not supposed to go. So this was the 2nd iteration. The 3rd iteration was we saw that there was often a sudden spike of events which would come up and then. This spike of events which uh would um crowd up this um the back end data sources, so we gave a um uh throttling mechanism. We built up a throttling mechanism which was configurable to and the idea we educated most of the people that this is something that you have to uh throttle count on a daily or, uh, monthly or on a yearly basis. One could set up and any of for that particular product for that particular module. If the number of events increases more than that, uh, count, the, uh, this has this library would drop off all the other events, those events. So in that was the first thing that we did. In the fourth iteration as we started thinking primarily, then we, we thought, uh, at this stage there were a lot of configuration files which came up. So we thought, uh, in order to push this configuration files we could do it directly or through the internet rather than, uh, on doing that rather rather than having a, um. Uh, refresh of the application itself. Refresh of the application would require that a new application is built, and then it would be, uh, put in the Play Store or, or, or in the App Store. So we created a dynamic, uh, dynamic download of this, uh, uh, um, configuration Jasons, and through this configuration Jasons, primarily the, uh, most of this, uh, product people were controlling the behavior of their products itself. So in this way we went on increasing the, uh, Making this, uh, uh, SDK itself better and better and at this stage we are in a place where most of the data is cleaned up right in the, uh, we get most of the data quality is increased right to the client state and, uh, before it goes on and goes and sits in the server. And we also come to know what are the events which uh what is the bad part of the data that was collected. This bad part of the data also we send it into a different container to the servers so there can be analytics which can be done and people can be intimidated and that this is the, this is the amount of bad data that you're generating and uh you need to take care of it. So we started building the dashboards around that too. So when we started building the dashboards, most of the management people also, uh, could understand how much of a bad data is getting created and how to contain that data because that was flooding a lot of, uh, uh, back end parts. It was seamless to cut down some of the events too, and we created in such a way that it was pretty easy on the fly without changing the applications. We could cut down all the events, so it went on in a lot of iterations. There was a lot of high powered discussions that went on in the team. I was the one who was, uh, getting this, uh, entire wheel going because when we started it, we had absolutely the team itself had absolutely no idea about how to go on. So, uh, at least a certain amount of groundwork had to be done so that people have something to discuss on. So from there we built on, for example, um, do you want me to elaborate more on that? Well, I,
Interviewer Insight
This is an interesting case. The candidate is reusing a story very early on in the interview, which is a troubling sign. The new information presented here does make the case for a different story, which is a net positive, and the candidate clearly outlines a continuous improvement process. Again, net positive. What is missing from this discussion is clear data to support the claims made in the answer. The candidate is paying lip service to a set of issues for which they had to craft solutions, but is not helping the interviewer understand how they parsed or prioritized these issues based on metrics or data.
INTERVIEWER
I'd first like to talk about the, what was the underlying principle driving all of these iterations.
CANDIDATE
Uh, so, underlying principle was, uh, How can we, uh, primarily it was about, uh, um, How do we make this entire analytics collection mechanism, uh, much more robust and much more effective so that all we get a high quality data without the how we get to have a high quality data right at the beginning without and uh mm. High quality data right at the beginning and then the, the amount of data that goes on, we don't disrupt the servers we don't disrupt the servers with a huge amount of data we just wanted to build in something which was um. Which would collect a very high quality data, which was highly configurable and uh which would, which was uh which was uh uh. Which will help the amount of data, unnecessary data, which will be primarily this is what we wanted to do, collect.
Interviewer Insight
good but what does this mean in practice? Candidate needs to expand on this so that an interviewer knows exactly what they mean. What metrics? How were they falling short?
INTERVIEWER
I guess I'm confused because, because you're. I'm not clear how you're measuring this, right? You're making claims tied to words, right? We wanted high quality data. Cool, I get that. But, but you're not defining the metrics. You're not defining the outcomes. And so this is kind of now that the third question we're on and, and metrics haven't really shown up yet. And so I guess I'm curious, how are you selling this internally, right? It's, it's, I mean, if, if it works within your organization, you go, we need a higher quality data, everyone just goes, yay, and they move forward. But, but how do you know you're getting better? That's, that's what's, I'm not, is not coming through in this answer.
CANDIDATE
Sure, sure. I'll, I can elaborate on that. So, one of the metrics for measuring the high quality of the data is that uh um The, uh, the products are not sending the uh send, send, uh, sending the events which they're not supposed to send for. First of all, we made a list of all the events which are, uh, supposed to be sent by the in the conference page for every product we started defining the list of events that these, uh, these products are supposed to send and for each event there are 50 or 60 attributes that they have to send and we also define. What are the rules which are, what are the values for these attributes, uh, which are allowed values for these attributes. So we defined this center conference page and then from there we started bringing them into the uh into the configuration files. So, uh, our first test was there would be no event, no event which is not defined in this, uh, uh, conference pages should flow into, uh, should flow into our servers. That is a first. Measure of the quality. Second thing is all the attributes that are attached to these events should, uh, adhere to the rules. They should be, uh, they should be in the way which is expected of them. For example, phone number should not be a big string. Likewise, there are a lot of rules which are attached to all these attributes. So we define those rules. Those rules were also brought into these configuration files, and, uh, and the measure of success was that once, once the data left the, um. Client and it went into the servers. It sat in the data bricks we serve the format of the data everything uh. It was as expected. For example, uh, email ID has to have a I sign, and it, it, it, it cannot be a phone number. Even if you upload a phone number for, for by mistake you start sending a phone number, it would not allow you to send the SDK would not uh send it through. So that was the quality which, uh, that's how we measured the quality. Got it. Is that clear, or can I elaborate more? There are much more things which I can uh put in here. A lot of details again,
Interviewer Insight
candidate needs to objectively look at this answer and ask themselves if, when explicitly asked, they presented a clear metric. The candidate states that they were checking to make sure that proper events were showing up. What is the actual metric? What was the target % rate? What specifically were they looking at?
INTERVIEWER
it's, it's this is still. Yep, it's fine at a conceptual level, right? We wanted to make sure the events weren't there, but, but I have no sense of how common it was before you did this, and after you did this, how many events that showed up weren't supposed to be there. For example, what you didn't say was, we reduced, um, we reduced the number of aberrant events by 80%. Right? Or we reduced the occurrence rate of aberrant events down to 1% or 0% or whatever. So I don't, I still don't have a sense of how you were measuring success.
CANDIDATE
Actually I have that metrics with me right now, right away. So, um, there was one, prior to this, prior to building up of one of the uh, uh, some of the features like throttle and other things, we see, we used to see that there used to be a sudden, uh, spike of the events, even a couple of, uh, months back there was a sudden spike of events, and that AWS used to get AWS bills used to get very, very high. So because the amount of data that used to get loaded up was very, very high. Now we brought, brought down those incidents to almost hardly, uh, um. On an, on an average we were measured that they were on a month they used to be around 7 to 8 such instances where sudden spike of events would come and people would, uh, uh, huddle and, uh, come together and then, uh, start stopping those events, not publishing a new application to the Play Store with this, uh, analytics SDK that we built up, we saw that such kind of events became almost zero. Almost from 7 to 8 in a month they became 0 because we introduced such kind of uh gates to do that. That was the first thing. Second thing was the quality of data. The quality of data when it started, uh, uh, the way we can, uh, when before introducing a lot of these checks and the thing, checks, we saw that there were a lot of, uh, I mean, they would be almost like, uh. Uh, on an average they were around anywhere between 50 to 60 issues that were raised in a month by the data quality team. Data quality team is a team which would, uh, um, start seeing, uh, start checking out the entire data and validity of the data which is present in the data, uh, data bricks, make sure that the data is good when it enters into the data bricks. So there were, there was no less than around 50 to 60, uh, Jira tickets were created every month on these 4 products. So with this thing we saw that they got reduced to somewhere around uh um 25, 26. On an average, the 25 to 26 is the number of Jira tickets that we um started having after making this uh quality efforts. After putting all these things in place, these configurations and other things, so these are the two prominent things. Primarily we reduced the entire AWS bills. For example, uh, sudden, when, whenever the spike happened, we, in the recent times we heard that around a few months back we heard that, um, on an average there was some 7 in Indian rupees it is like 5 ₹500 every month, every day. So for a period of 4 days it was to be 20 lakhs. I don't know how many, how, how, how many dollars it would be, so. Um, from, we reduced all those bills for the company. We never had such a kind of an instance after that. And nowadays, now we don't have such a thing.
Interviewer Insight
took too many questions to get at measurements. Even the measure given was non-specific and of low fidelity/quality.
Expert Assessment
Interviewer assessment - would be used in a hiring meeting
A recurring theme throughout the interview is a lack of details with regard to metrics and data. In this answer block the candidate was repeatedly challenged to provide a clear set of metrics by which they were holding the line for accountability in their product/service and the candidate was speaking at too high a level to make the case that there was a strong performance oriented nature to the team. Interviewer had to ask too many times to finally get to a number and when the candidate presented numbers, they were presented as a range and in a way that did not give interviewer confidence that they were a critical part of the work environment.