R blog – Bishwajit Ghose

Making toy data

Method 1: manual entry

toy <- cbind(price=c(120, 240, 360, 480, 600),  name=c('jack', 'rio', 'irma', 'tio', 'tess'),  color =c('tomato', 'midnightblue', 'mintcream', 'lightcoral'))

Method 2: Using the seq function of base R.

df <- data.frame( individual=paste( "hallo ", seq(1,60), sep=""), 
     value=sample( seq(10,100), 60, replace=T))

Method 3: Using the expand.grid function

df <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50),
            sex = c("Male","Female"), country=c('nep','ind', 'sri'))

Genrating summary tables

# Using the latest df:

tbl_summary(df)

Output prompted on a default browser:

Summarising an entire df (using descr package):

Summarising selected vars:

Basic dataframe queries

# Inquiring the class attribute of an object (a dataframe in this case):

class(df)

# printing the number of rows and cols:

dim(df)

# print first 6 rows:

head(df)

Handling missing values

# Let us create a df with some missing cells:

col1<-seq(11,21); col2<-seq(23,30); col3<-seq(43,50)
length(col1)<-10; length(col2)<-10; length(col3)<-9
df <-cbind(col1,col2,col3)

And inspect for missings:

sum(is.na(df))

This correctly identifies the number of missing values as 3. We can either remove the na cells to get a complete df by the folllowing:

df1<- na.omit(df)

Or replace them by column mean using the zoo package:

na.aggregate(df)

or, to drop columns with all missing values:

df <- df[,colSums(is.na(df))<nrow(df)]

Changing letter case using the stringi package

#creating some random names

first_name <- as.vector(randomStrings(n = 10, len = 5, digits = F))
last_name <- as.vector(randomStrings(n = 10, len = 7, digits = F))

# concatenating first and last names

names=paste(first_name, last_name)

#Converting to a df

names <- as.data.frame(names)

# the latter case is currently out of order which can be remedied using the str function that comes with the stringi package.

The following options are available:

# To proper

names <- stri_trans_totitle(names) names

# To lower

names <- stri_trans_tolower(names) names

# To upper

names <- stri_trans_toupper(names) names

Subsetting df

Making toy data

Genrating summary tables

Basic dataframe queries

Handling missing values

Changing letter case using the stringi package

Subsetting df

Dummy coding

Data types

with great data comes great responsibilities