Bishwajit Ghose

Making toy data

Method 1: manual entry

toy <- cbind(price=c(120, 240, 360, 480, 600),  name=c('jack', 'rio', 'irma', 'tio', 'tess'),  color =c('tomato', 'midnightblue', 'mintcream', 'lightcoral'))   

Method 2: Using the seq function of base R.

df <- data.frame( individual=paste( "hallo ", seq(1,60), sep=""), 
value=sample( seq(10,100), 60, replace=T))

Method 3: Using the expand.grid function

df <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50),
            sex = c("Male","Female"), country=c('nep','ind', 'sri'))

Genrating summary tables

 

# Using the latest df:

tbl_summary(df)

Output prompted on a default browser:

Summarising an entire df (using descr package):

Summarising selected vars:

 

 

Basic dataframe queries

 

# Inquiring the class attribute of an object (a dataframe in this case):

class(df)

# printing the number of rows and cols:

dim(df)

# print first 6 rows:

head(df) 

 

 

 

Handling missing values

 

# Let us create a df with some missing cells:

col1<-seq(11,21); col2<-seq(23,30); col3<-seq(43,50)
length(col1)<-10; length(col2)<-10; length(col3)<-9
df <-cbind(col1,col2,col3)

And inspect for missings:

sum(is.na(df))

 This correctly identifies the number of missing values as 3. We can either remove the na cells to get a complete df by the folllowing:

df1<- na.omit(df)

Or replace them by column mean using the zoo package:

na.aggregate(df)

or, to drop columns with all missing values:

df <- df[,colSums(is.na(df))<nrow(df)]

 

Changing letter case using the stringi package

#creating some random names

first_name <- as.vector(randomStrings(n = 10, len = 5, digits = F))
last_name <- as.vector(randomStrings(n = 10, len = 7, digits = F))

# concatenating first and last names

names=paste(first_name, last_name)

#Converting to a df

names <- as.data.frame(names)

# the latter case is currently out of order which can be remedied using the str function that comes with the stringi package.

The following options are available: 

# To proper

names <- stri_trans_totitle(names) names

# To lower

names <- stri_trans_tolower(names) names

# To upper

names <- stri_trans_toupper(names) names

Subsetting df

<embed src=”https://infoart.ca/wp-content/uploads/2022/04/subset.pdf”  type=”application/pdf” width=”900″ height=”940″></embed>

 

 

Dummy coding

Data types