Averaging subsets in a large dataset
Hey guys,
I have 17899 pieces of data recorded over an hour. Each reading took place 0.2 seconds after the previous. I need to average the data (vertical wind velocity) every 10 seconds, which would be every 10/0.2 = 50 wind velocity readings.
I had a for loop working for the first 359 subsets.. but the last one has less than 50 readings in it, so it fails. One variable is defined as the velocity and another is time (in 0.2 second increments). How should I setup this loop? With respect to time(every 10s)? velocity(every 50 readings)?
I also have to do this for 30 seconds, 1 minute, 2 minutes, and 10 minutes... Can I make a loop that does all of this?
Thanks a lot for the help.
I have 17899 pieces of data recorded over an hour. Each reading took place 0.2 seconds after the previous. I need to average the data (vertical wind velocity) every 10 seconds, which would be every 10/0.2 = 50 wind velocity readings.
I had a for loop working for the first 359 subsets.. but the last one has less than 50 readings in it, so it fails. One variable is defined as the velocity and another is time (in 0.2 second increments). How should I setup this loop? With respect to time(every 10s)? velocity(every 50 readings)?
I also have to do this for 30 seconds, 1 minute, 2 minutes, and 10 minutes... Can I make a loop that does all of this?
Thanks a lot for the help.
0
Comments
You could create a counter variable and have it add one to its value every time you grab a data value. If you know that you have 17899 records, you could have your code exit when your counter = 17899.
To do this, you'll only one while loop which goes through all the data. So, while less than number of records. Inside this loop you'll need a bunch of counters. Let's work with the 10 seconds first.
So, in your loop you should have a iteration counter, total counter and number of readings counter. The iteration counter should increment each time you hit your loop. When it hits 50, you'll do the following:
1. Increment the Number of Readings by 1
2. Add the reading to the total counter
3. Reset the Iteration counter to 0
After you're out the the loop, you'll take the total counter and divide it by the number of readings counter.
Best of all, by duplicating the same efforts with different variable names, all you need to do is change the iteration counter's conditional from 50 to whatever it needs to be and only pass through the data once. (BTW, this is a great thing to learn if you're heading towards a programming career. I have to process gigabytes of flat files in the smallest time possible. Each time you have to iterate through the data exponentially increases the run time of the program.)
Good luck!
this assumes your wind velocity data is in a vector called "data"
your average values will be in the vals{} cell array.
have fun.
this is good advice for other programming languages, but not matlab.
Thanks a lot, that loop is great. The averages are slightly off tho, I think because of the rounding to negative infinity you have in the "maxind" variable definition? I just averaged the first few sets of 50 readings in Excel and the averages are off from your code a little bit. Is there anything I can try to fix this?
Also, the first reading is at 0, not 0.2 seconds. Thought that might be a problem also.
Thanks a lot,
Dave
The code is independent of timestamps.
all the floor() function is doing is making sure that the last section has the appropriate number of points in it for the average, it will not affect the values at all.
The code you wrote averages correctly through the 357th set of data (357th set of 50 w readings). but there are two more sets that don't get averaged. Apparently theres a digit off by one somewhere or something haha. Heres my actual code.. I dont know why the final 98 readings don't get averaged.. any ideas?
Thanks
yes, but it wouldn't account for that much of a difference, i don't think.
The code posted takes only the number of whole sets and reshapes them into a matrix, so of course the remaining data are not averaged.
For example
17899 / 10 * 0.2 = 357.98
50 * 0.98 = 49
or..
17899 % ((int)(17899 / 10 * 0.2)) = 49
So the code schwaip posted discards 49 data that do not fit into a whole set as defined by the period, in the case of my example, 10 seconds
This may be desirable, since the first 357 sets represent data for 10 seconds, and the remaining 49 data represent only a 9.8 second period. It is not technically accurate to compare the two as they represent different things and there is no real way to substitute missing data.
If you want to process the remaining data into an average, then you can isolate them with the following psuedo code, reusing schwaips variable names in bold and assuming 1 based vectors:
You could include this figure with an annotation stating that it is derived from data for a different period of time, and give the period. I don't see much worth in that, though.
Edit: Apparently vectors in MatLab are 1-based.
basically, the behavior you are seeing is what i wrote the code to do.
you can't get an average over 10s of data when all you have is 9.8s (well, you can, but that's not the point of looking at longer term averages)