Tuesday, November 5, 2013

CSV to ARFF Converter

In my data mining experience, I always found it easier to work with ARFF files (Attribute-Relation File Format) than CSV files (Comma Separated Values).  So I decided to construct a CSV to ARFF file converter using Python.  It's very simple: it opens a CSV file that the user specifies and writes the contents in the format of an ARFF file, and then saves it as such.

This converter uses a single class, called convert.  The class has an __init__() method and two additional methods: one to read the CSV file and one to write to an ARFF file.

 class convert(object): 
    def __init__(self):  
       #call csvInput()
       #call arffOutput()
 
    def csvInput(self): 
       #import CSV 

    def arffOutput(self): 
       #export ARFF  

The __init__() method calls the two functions that read in the CSV file and that writes to an ARFF file.

 def __init__(self):  
    self.csvInput()  
    self.arffOutput()

In the csvInput() method, the user is prompted to enter the name of the CSV file.  If the file is not found, the user is prompted to try again.  The .csv file extension from the user's input is removed, and the file is read in with the comma treated as the delimiter between fields.  Once every line is read in, the file is closed.

 def csvInput(self):  
    #prompt user for input  
    #remove .csv file extension
    #try opening file
       #read file lines
       #close file
    #IOError (if file does not exist)
       #call csvInput() again

In the arffOutput() method, a new file is created with the .arff file extension.  The function then starts to write the data from the CSV file to the newly created ARFF file.  It first writes the name of the relation, then it writes the name of all of the attributes and their data types.  Finally, it writes the data information.  Once it's complete, the file is closed.

 def arffOutput(self):
    #create new .arff file
    #write @relation to file
    #get attribute types
       #write @attribute to file   
    #get class items
    #write class @attribute to file
    #write @data to file
       #close file

I have it set up so that the data type options are either numeric or nominal, since those are the ones that I most commonly use. The complete code can be found on my GitHub.

5 Physical Data: November 2013 In my data mining experience, I always found it easier to work with ARFF files (Attribute-Relation File Format) than CSV files (Comma Sep...
< >