Chapter 12 DataFrame

This chapter explains how to create DataFrame object, how to access its elements, and its member functions. In Rcpp, DataFrame is implemented as a kind of vector. In other words, Vector is a vector whose element is scalar value, while DataFrame is a vector whose elements are Vectors of the same length. Therefore, Vector and DataFrame have many common methods of creating objects, accessing elements, and member functions.

12.1 Creating a DataFrame object

DataFrame::create() is used to create a DataFrame object. Also, use Named() or _[] if you want to specify column names when creating DataFrame object.

// Creating DataFrame df from Vector v1, v2
DataFrame df = DataFrame::create(v1, v2);
// When giving names to columns
DataFrame df = DataFrame::create( Named("V1") = v1 , _["V2"] = v2 );

When you create a DataFrame withDataFrame::create(), the value of the originalVector element will not be duplicated in the columns of the DataFrame, but the columns will be the “reference” to the original Vector. Therefore, changing the value of the original Vector changes the value of the columns. To avoid this, we use the clone() function to duplicate the value of the Vector element when creating a DataFrame column.

To see the difference between using the clone() function and not using it, see the code example below. In the code example, we are creating DataFrame df from Vector v. There, column V1 is a reference to v, and column V2 replicates the value of v by the clone () function. After that, if you change to Vector v, the values of column V1 is changed, but V2 is not affected.

// [[Rcpp::export]]
DataFrame rcpp_df(){
    // Creating vector v
    NumericVector v = {1,2};
    // Creating DataFrame df
    DataFrame df = DataFrame::create( Named("V1") = v,         // simple assign
                                      Named("V2") = clone(v)); // using clone()
    // Changing vector v
    v = v * 2;
    return df;
}

Execution result

> rcpp_df()
  V1 V2
1  2  1
2  4  2

12.2 Accessing DataFrame elements

When accessing a specific column of DataFrame, the column is temporarily assigned to Vector object and accessed via the object. As with Vector, the DataFrame column can be specified by a numeric vector (column number), a string vector (column name), and a logical vector.

NumericVector v1 = df[0];
NumericVector v2 = df["V2"];

As with DataFrame creation, assigning aDataFrame column to Vector in the above way will not copy the column value to Vector object, but it will be a “reference” to the column. Therefore, when you change the values of Vector object, the content of the column will also be changed.

If you want to create a Vector by copying the value of the column, use clone() function so that the value of the original DataFrame column is not changed.

NumericVector v1 = df[0]; // v1 becomes "reference" to the 0th column of df
v1 = v1 * 2;              // Changing the value of v1 also changes the value of df[0]

NumericVector v2 = clone(df[0]); // Duplicate the value of the element of df[0] to v2
v2 = v2*2;                       // Changing v2 does not change the value of df [0]

12.3 Member functions

In Rcpp, DataFrame is implemented as certain kinds of vectors. In other words, Vector is a vector whose elements are scalar values, and DataFrame is a vector whose elements are Vectors. Therefore, DataFrame has many member functions common to Vector.

12.3.1 length() size()

Returns the number of columns.

12.3.2 nrows()

Returns the number of rows.

12.3.3 names()

Returns the column name as a CharacterVector.

12.3.4 offset(name) findName(name)

Returns the numerical index of the column with the name specified by the string “name”.

12.3.5 fill(v)

fills all the columns of this DataFrame withVector v.

12.3.6 assign( first_it, last_it)

Assign columns in the range specified by the iterators first_it and last_it to this DataFrame.

12.3.7 push_back(v)

Add Vector v to the end of thisDataFrame.

12.3.8 push_back( v, name )

Append a Vector v to the end of thisDataFrame. Specify the name of the added column with the string “name”.

12.3.9 push_front(x)

Append a Vector v at the beginning of thisDataFrame.

12.3.10 push_front( x, name )

Append a Vector v at the beginning of thisDataFrame. Specify the name of the added column with the string “name”.

12.3.11 begin()

Returns an iterator pointing to the first column of this DataFrame.

12.3.12 end()

Return an iterator pointing to the end of this DataFrame.

12.3.13 insert( it, v )

Add Vector v to this DataFrame at the position pointed by the iterator it and return an iterator to that element.

12.3.14 erase(i)

Delete the ith column of this DataFrame and return an iterator to the column just after erased column.

12.3.15 erase(it)

Deletes the column specified by the iterator it and returns an iterator to the column just after erased column.

12.3.16 erase(first_i, last_i)

Deletes the first_ith to last_i - 1th columns and returns an iterator to the column just after erased column.

12.3.17 erase(first_it, last_it)

Deletes the range of columns from those specified by the iterator first_it to those specified by last_it - 1 and returns an iterator to the column just after the erased columns.

12.3.18 containsElementNamed(name)

Returns true if this DataFrame has a column with the name specified by the string name.

12.3.19 inherits(str)

Returns true if the attribute “class” of this object contains the string str.