How much data do you need?

How much data do you need – Lean 6 Sigma Mini Lesson.mp4 

Hi everyone!, Cedro here with Lean 6 Sigma toolbox. Today we’re going to answer the question, “How much data do I need in order to prove that the project I’m working on is actually having an impact on my organization?” I just got off a call with my friend Jeffrey. He’s going through my Lean Six Sigma Greenbelt class and he’s working on a project that is looking at how much time workers spend away from their workstations doing things like fetching materials, looking for parts, looking for instruction, anything that keeps them from actually doing their value added work. He asked me how much data he needed to collect. I said well, it depends how much of an improvement do you think you can make in their time away from the work station that’s going to tell you how much data you need to collect. So a lot of the time people in the past will say “collect 30 data points or 40 data points” and they just kind of throw some numbers out there. But it’s actually really easy to figure out exactly how many data points you need. So I’m going to show you how to do that.  

This is the data that he has now. So he has the date, the shift that they’re working on, which workstation he’s measuring, and then the amount of time away they have from their workstation when they actually not being productive. And he’s only got like 15 data points. And again, the question is how many data points do you need? So how do you figure that out? Well, first you need to do is figure out what your average and your standard deviation is for this data set. So to do that, we’re just going to take the average. Just good old fashion average. And then select these data points to the right hit enter. It looks like our average is 15.8. And now we also need our standard deviation. I’m going to use standard deviation dot S.  

Then the next question you should be asking is how much improvement am I going to make to the process because the amount of improvement you make decides how many data points you need to collect to actually detect that improvement. If it’s really small, you’re going to need a lot of data points. You’re going to. Make a massive improvement. Then you’re only going to need a few data points. So right here, we’re going to assume that we cut that time in half. So we went from 15.8 down to 7.5. So we made an improvement of roughly 7.5. And let’s assume that our standard deviation is going to be the same about four point. One great. So that’s the improvement we expect to make. So we’re going to detect a difference of 15.8 -, 7.5, which is a difference of 8.3. That’s the average number of minutes that we think we can improve this process by helping keep our workers at their workstation.  

So the next thing we’re going to do is go up to Sigma XL, click on statistical tools. Down to power and sample size calculators and got a 2 sample T test calculator A2 sample T test is. Simply test that measures a sample of data before and after you made a process change to see if you can detect a shift in the meaning or did the average performance of that process actually change? We’re going to select sample size and with the little solve for because we want to know how many samples we need. For our beta risk, we’re going to put in 10%, which is a power of .9, that means. Whoops, .9 so 0.9. That’s our power, 90%, which gives us a. 10% beta risk. That’s the risk of making a Type 2 error.  

What is the type? 2 error. Well, it’s the risk of saying that there wasn’t a change in the process when in fact there actually was. So we actually improved the time of that, that our workers spend away from their stations. We reduced it, but we didn’t detect that in our data. So that we’re going to. Give ourselves a 10% risk of that. And then on the opposite side of Type 1 here we’ve got an alpha risk down here. We’re going to get ourselves a 5% risk of making a Type 1 error.  

And what is the Type 1 error? What is the risk of detecting a difference? Saying that there was a change in the process? When in fact there actually wasn’t. So we’re only going to give ourselves a 5% chance of making that. Mistake the improvement in the mean is 8.3, so we think we’re going to improve it by 8.3 minutes and our standard deviation was 4.1. So based on that all we have to do is click OK and it’s going to tell us right here how many samples we need before and after to detect a change in the mean. Of eight point. 30 minutes. So in this case it says we need a sample size of seven. So that means what we have to take seven data points before we make changes to the process. And then we have to get another seven data points after we make changes to the process. If we do, we can detect a change in the mean of 8.3 and that’s it. Now you know that you only need 7 samples. Before you can go and change the process. Now if you hadn’t done that calculation, you might go off and collect 30 data points or 4. Data points. You might think you need a lot more data than you actually do, but because we’re planning on making such a significant impact to this process, we only need a very small data set.  

So what’s the moral of the story? Make big impacts on your business using data and just use the right amount of data. It’s very easy to figure out how much you need. If this is interesting to you and you want to start making a big impact in your business, reach out to me at lean6sigmatoolbox.com love to talk to you about how we can make changes and use process improvement and Lean 6 Sigma to have a big impact on your business.