To talk on Icrontic, just register!

It only takes 30 seconds.

Have an account? Sign in:

Forgot?
geodave
Getting settled in
geodave
8 Posts

Question Batch-processing in MATLAB

Hello All!

I'm fairly new to MATLAB and have been given a task that is quite challenging to me.

I have a text file that contains four columns and many rows (on the order of a few hundred thousand rows). I'm trying to write a script in MATLAB that will read the rows and columns from this file, and write only the first three columns into a new text file.

Here's what the original dataset in the text file looks like:

Code:
-16.517754 3.610515 -0.847929 30
-16.472557 3.611480 -0.845726 28
-16.477274 3.617941 -0.846026 30
-16.433626 3.616477 -0.843872 32
-16.431351 3.626801 -0.843872 30
-16.424358 3.630670 -0.843572 32
-16.406473 3.637529 -0.842770 32
-16.406305 3.642901 -0.842820 34
-16.403439 3.655784 -0.842820 34
-16.409687 3.659884 -0.843171 32
Basically, I want columns 1,2, and 3, but not column 4, to be written in the new text file.

I have written the following script in MATLAB that works very well for a small dataset (a few thousand rows). It creates two text files; one with a header (x, y, z) and another without a header.

Code:
clear all

% CHANGE THIS TO YOUR FILE'S NAME
xyz_data = load('serrano_all_data.txt');

x = xyz_data(:,1);
y = xyz_data(:,2);
z = xyz_data(:,3);

B = [x y z];  % new 3-column matrix
 
% no header
dlmwrite('xyz_no_header.txt',B);  % comma delimited

% with headers
fid = fopen('xyz_header.txt','w+t');
fprintf(fid,'x,y,z\n',B);  % writes headers to text file
fclose(fid);
dlmwrite('xyz_ArcMap.txt',B,'-append');  % comma delimited, appends new xyz matrix to text file
However, when I try running this script on a much larger dataset (16 million rows), I get the following error in MATLAB:

Code:
??? Error using ==> horzcat
Out of memory. Type HELP MEMORY for your options.

Error in ==> three_column_utility at 22
B = [x y z];  % new 3-column matrix
I have a suspicion I'm getting this error because the dataset is very large. When I asked a friend who is more familiar with MATLAB than I, he suggested I process the data in batch rather than all at once. For example, the script should read the first thousand lines and write them into the new text file using the new format (3 columns instead of 4 columns), then move on to the next thousand lines and append those to the bottom of the first thousand lines in the new text file using the new format (3 columns instead of 4 columns), and so on, until the end of the file is reached (using the feof command, I think).

My problem is that I'm not quite sure how to do this (if this is the right approach, that is). Any help/suggestions/tips would be greatly appreciated!
shwaip
elaborate bot
shwaip
5,729 Posts
are you doing this in windows or linux?
__________________ my photostream for ic photography challenge

Anyone who wants dropbox, please use my referral link
geodave
Getting settled in
geodave
8 Posts
I'm doing this in Windows.
shwaip
elaborate bot
shwaip
5,729 Posts
if you had been using linux, it could have been a 1-line bash command :P

rather than saying:
Code:
x = xyz(:,1);
y = xyz(:,2);
z  = xyz(:,3);
B = [ x y z ];
you can just address the columns of xyz:

Code:
xyz(:,1:3);
So, your problem is that you're creating 3 copies of the data in your memory. below should only be 1 copy.

i think this should work.

Code:
clear all

% CHANGE THIS TO YOUR FILE'S NAME
xyz_data = load('serrano_all_data.txt');


% no header
dlmwrite('xyz_no_header.txt',xyz_data(:,1:3));  % comma delimited

% with headers
fid = fopen('xyz_header.txt','w+t');
fprintf(fid,'x,y,z\n');  % writes headers to text file
fclose(fid);
dlmwrite('xyz_ArcMap.txt',xyz_data(:,1:3),'-append');  % comma delimited, appends new xyz matrix to text file
geodave
Getting settled in
geodave
8 Posts
Thanks for looking at this. I copied and pasted your modified version of my script, and ran it using the small sample of the dataset. It worked nicely. But, when I ran it using the large dataset (~0.6 Gb text file, four columns by ~16 million rows), I received the following error:

Code:
??? Error using ==> load
Out of memory. Type HELP MEMORY for your options.

Error in ==> new_utility at 4
xyz_data = load('serrano_all_data.txt');
It's different than the error I used to get when I used my old script in that MATLAB wasn't happy using "horzcat", whereas now MATLAB isn't happy using "load".

What are your thoughts on this? Again, thanks for spending the time in helping me with this.
shwaip
elaborate bot
shwaip
5,729 Posts
Hi.

Basically, you're running out of memory. You're right that the file is way too long.

try this (it'll probably be slow):

Code:
clear all

% open files
fp = fopen('serrano_all_data.txt');
fp_head = fopen('xyz_header.txt','w+t');
fp_nohead = fopen('xyz_noheader.txt','w+t');

%write headers
fprintf(fp_head,'x,y,z\n');

while 1
      line = fgetl(fp); %read line
      if ~ischar(line),break,end; %make sure that we got data
      spl = regexp(line,' ','split'); %split on space
      fprintf(fp_head,'%f,%f,%f',spl{1},spl{2},spl{3}); %write data to file with head
      fprintf(fp_nohead,'%f,%f,%f',spl{1},spl{2},spl{3}); %write data to file with no head
end

fclose all; %close pointers
geodave
Getting settled in
geodave
8 Posts
I tried your latest version of the script, and it was taking quite a long time to process the text file (I had to force MATLAB to quit after about 1 hour from the start of the run). However, I tried running your older modified version of the script from the post where you suggested I address the columns of x, y, and z as follows:

Code:
xyz(:,1:3);
using a computer with greater RAM... and it worked! It took ~39 minutes to process the ~0.6 Gb text file, but it did it flawlessly!

I appreciate your help with this. Thank you!
Similar Threads
Thread Thread Starter Forum Replies Last Post
Writing new files from Matlab raulinhoo Matlab Help 6 5 Jun 2008 8:31pm
Closing external programs in matlab raulinhoo Matlab Help 2 4 Jun 2008 3:55pm
Batch file help Trogan General Software 4 27 Mar 2007 6:36pm
How do you make a batch file? danball1976 General Software 1 22 Aug 2004 9:48am

Go Back   Icrontic Forums > Tech: Software > General Software > Matlab Help
Jump to
This Thread Search this Thread
Search this Thread:

Advanced Search


Current time: 11:14pm (GMT)
Powered by vBulletin®
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Get Vanilla instead. Trust me.