Title: | Generate Random Data Sets |
---|---|
Description: | Generates random data sets including: data.frames, lists, and vectors. |
Authors: | Tyler Rinker [aut, cre], Josh O'Brien [ctb], Ananda Mahto [ctb], Matthew Sigal [ctb], Jonathan Carroll [ctb], Scott Westenberger [ctb] |
Maintainer: | Tyler Rinker <[email protected]> |
License: | GPL-2 |
Version: | 0.3.7 |
Built: | 2024-10-29 04:01:18 UTC |
Source: | https://github.com/trinker/wakefield |
Generate a random vector of ages within the provided range. The default age range is set between 18 and 89, to match the age ranges which appear (see e.g., https://gssdataexplorer.norc.org/variables/53/vshow).
age(n, x = 18:89, prob = NULL, name = "Age")
age(n, x = 18:89, prob = NULL, name = "Age")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random integer vector of ages within the provided range (defaults to 18:89).
Other variable functions:
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
age(10) # draw 10 ages with default values hist(age(n=10000)) interval(age, 3, n = 1000)
age(10) # draw 10 ages with default values hist(age(n=10000)) interval(age, 3, n = 1000)
animal
- Generate a random vector of animals.
pet
- Generate a random vector of pets.
animal(n, k = 10, x = wakefield::animal_list, prob = NULL, name = "Animal") pet( n, x = c("Dog", "Cat", "None", "Bird", "Horse"), prob = c(0.365, 0.304, 0.258, 0.031, 0.015), name = "Pet" )
animal(n, k = 10, x = wakefield::animal_list, prob = NULL, name = "Animal") pet( n, x = c("Dog", "Cat", "None", "Bird", "Horse"), prob = c(0.365, 0.304, 0.258, 0.031, 0.015), name = "Pet" )
n |
The number elements to generate. This can be globally set within
the environment of |
k |
The number of the elements of x to sample from (uses |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The household pets and probabilities:
Dog | 36.5 % |
Cat | 30.4 % |
None | 25.8 % |
Bird | 3.1 % |
Horse | 1.5 % |
Returns a random factor vector of animal elements.
Other variable functions:
age()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
animal(10) pie(table(animal(10000))) pet(10) pie(table(pet(10000)))
animal(10) pie(table(animal(10000))) pet(10) pie(table(pet(10000)))
A dataset containing a character vector animals
data(animal_list)
data(animal_list)
A character vector with 591 elements
https://a-z-animals.com/animals
Generate a random vector of answers (yes/no).
answer(n, x = c("No", "Yes"), prob = NULL, name = "Answer")
answer(n, x = c("No", "Yes"), prob = NULL, name = "Answer")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of answers to sample from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random factor vector of answers (yes/no) outcome elements.
Other variable functions:
age()
,
animal()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
answer(10) 100*table(answer(n <- 10000))/n
answer(10) 100*table(answer(n <- 10000))/n
Generate a random vector of areas ("Suburban", "Urban", "Rural").
area(n, x = c("Suburban", "Urban", "Rural"), prob = NULL, name = "Area")
area(n, x = c("Suburban", "Urban", "Rural"), prob = NULL, name = "Area")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of area status elements.
Other variable functions:
age()
,
animal()
,
answer()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
area(10) barplot(table(area(10000)))
area(10) barplot(table(area(10000)))
Converts a data.frame
of factor
s to
integers.
as_integer(x, cols = NULL, fun = as.integer)
as_integer(x, cols = NULL, fun = as.integer)
x |
A |
cols |
Numeric indices of the columns to incude (use |
fun |
An |
Returns a data.frame
equal to the
class
of x
with integer columns rather than factor.
as_integer(r_series(likert_7, 5, 10)) as_integer(r_series(likert_7, 5, 10), cols = c(2, 4)) library(dplyr) r_data_frame(n=100, age, political, sex, grade ) %>% as_integer(2:3)
as_integer(r_series(likert_7, 5, 10)) as_integer(r_series(likert_7, 5, 10), cols = c(2, 4)) library(dplyr) r_data_frame(n=100, age, political, sex, grade ) %>% as_integer(2:3)
Generate a random vector of cars (see ?mtcars
).
car(n, x = rownames(datasets::mtcars), prob = NULL, name = "Car")
car(n, x = rownames(datasets::mtcars), prob = NULL, name = "Car")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of car elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
car(10) table(car(10000))
car(10) table(car(10000))
Generate a random vector of number of children.
children( n, x = 0:10, prob = c(0.25, 0.25, 0.15, 0.15, 0.1, 0.02, 0.02, 0.02, 0.02, 0.01, 0.01), name = "Children" )
children( n, x = 0:10, prob = c(0.25, 0.25, 0.15, 0.15, 0.1, 0.02, 0.02, 0.02, 0.02, 0.01, 0.01), name = "Children" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of number of children elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
children(10) pie(table(children(100)))
children(10) pie(table(children(100)))
Generate a random vector of coin flips (heads/tails).
coin(n, x = c("Tails", "Heads"), prob = NULL, name = "Coin")
coin(n, x = c("Tails", "Heads"), prob = NULL, name = "Coin")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of coin outcomes to sample from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random factor vector of coin flip outcome elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
coin(10) 100*table(coin(n <- 10000))/n
coin(10) 100*table(coin(n <- 10000))/n
color
- Generate a random vector of colors (sampled from colors()
).
color
- Generate a random vector of psycological primary
colors (sampled from colors()
).
color(n, k = 10, x = grDevices::colors(), prob = NULL, name = "Color") primary( n, x = c("Red", "Green", "Blue", "Yellow", "Black", "White"), prob = NULL, name = "Color" )
color(n, k = 10, x = grDevices::colors(), prob = NULL, name = "Color") primary( n, x = c("Red", "Green", "Blue", "Yellow", "Black", "White"), prob = NULL, name = "Color" )
n |
The number elements to generate. This can be globally set within
the environment of |
k |
The number of the elements of x to sample from (uses |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random factor vector of color elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
color(10) pie(tab <- table(color(10000)), col = names(tab)) primary(10) pie(tab <- table(primary(10000)), col = names(tab)) barplot(tab <- table(primary(10000, prob = probs(6))), col = names(tab))
color(10) pie(tab <- table(color(10000)), col = names(tab)) primary(10) pie(tab <- table(primary(10000)), col = names(tab)) barplot(tab <- table(primary(10000, prob = probs(6))), col = names(tab))
Generate a random vector of dates.
date_stamp( n, random = FALSE, x = NULL, start = Sys.Date(), k = 12, by = "-1 months", prob = NULL, name = "Date" )
date_stamp( n, random = FALSE, x = NULL, start = Sys.Date(), k = 12, by = "-1 months", prob = NULL, name = "Date" )
n |
The number elements to generate. This can be globally set within
the environment of |
random |
logical. If |
x |
A vector of elements to chose from. This may be |
start |
A date to start the sequence at. |
k |
The length of the sequence (number of the elements) so build out from
|
by |
The interval to use in building the sequence. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random factor vector of date elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
date_stamp(10) pie(table(date_stamp(2000, prob = probs(12)))) ## Supply dates to `x` to sample from date_stamp(10, x = seq(as.Date("1980-11-16"), length = 30, by = "1 years"))
date_stamp(10) pie(table(date_stamp(2000, prob = probs(12)))) ## Supply dates to `x` to sample from date_stamp(10, x = seq(as.Date("1980-11-16"), length = 30, by = "1 years"))
Generate a random logical vector of deaths (TRUE
/FALSE
).
death(n, prob = NULL, name = "Death") died(n, prob = NULL, name = "Died")
death(n, prob = NULL, name = "Death") died(n, prob = NULL, name = "Died")
n |
The number elements to generate. This can be globally set within
the environment of |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random logical vector of death outcome elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
death(10) died(10) 100*table(death(n <- 10000))/n 100*table(death(n <- 10000, prob = c(.3, .7)))/n r_data_frame(10, died)
death(10) died(10) 100*table(death(n <- 10000))/n 100*table(death(n <- 10000, prob = c(.3, .7)))/n r_data_frame(10, died)
Generate a random vector of dice throws.
dice(n, x = 1:6, prob = NULL, name = "Dice")
dice(n, x = 1:6, prob = NULL, name = "Dice")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of dice throw elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
dice(10) barplot(table(dice(10000)))
dice(10) barplot(table(dice(10000)))
Generate a random vector of DNA nucleobases ("Guanine", "Adenine", "Thymine", "Cytosine").
dna( n, x = c("Guanine", "Adenine", "Thymine", "Cytosine"), prob = NULL, name = "DNA" )
dna( n, x = c("Guanine", "Adenine", "Thymine", "Cytosine"), prob = NULL, name = "DNA" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of DNA nucleobase elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
dna(10) barplot(table(dna(10000)))
dna(10) barplot(table(dna(10000)))
Generate a random vector of birth dates.
dob( n, random = TRUE, x = NULL, start = Sys.Date() - 365 * 15, k = 365 * 2, by = "1 days", prob = NULL, name = "DOB" ) birth( n, random = TRUE, x = NULL, start = Sys.Date() - 365 * 15, k = 365 * 2, by = "1 days", prob = NULL, name = "Birth" )
dob( n, random = TRUE, x = NULL, start = Sys.Date() - 365 * 15, k = 365 * 2, by = "1 days", prob = NULL, name = "DOB" ) birth( n, random = TRUE, x = NULL, start = Sys.Date() - 365 * 15, k = 365 * 2, by = "1 days", prob = NULL, name = "Birth" )
n |
The number elements to generate. This can be globally set within
the environment of |
random |
logical. If |
x |
A vector of elements to chose from. This may be |
start |
A date to start the sequence at. |
k |
The length of the sequence (number of the elements) so build out from
|
by |
The interval to use in building the sequence. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of birth date elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
dob(10) barplot(table(birth(15))) barplot(table(birth(30)))
dob(10) barplot(table(birth(15))) barplot(table(birth(30)))
Generate a random dummy coded (0/1) vector.
dummy(n, prob = NULL, name = "Dummy")
dummy(n, prob = NULL, name = "Dummy")
n |
The number elements to generate. This can be globally set within
the environment of |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random dummy vector of (0/1) elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
dummy(100, name = "Var") table(dummy(1000))
dummy(100, name = "Var") table(dummy(1000))
Generate a random vector of educational attainment level.
education( n, x = c("No Schooling Completed", "Nursery School to 8th Grade", "9th Grade to 12th Grade, No Diploma", "Regular High School Diploma", "GED or Alternative Credential", "Some College, Less than 1 Year", "Some College, 1 or More Years, No Degree", "Associate's Degree", "Bachelor's Degree", "Master's Degree", "Professional School Degree", "Doctorate Degree"), prob = c(0.013, 0.05, 0.085, 0.246, 0.039, 0.064, 0.15, 0.075, 0.176, 0.072, 0.019, 0.012), name = "Education" )
education( n, x = c("No Schooling Completed", "Nursery School to 8th Grade", "9th Grade to 12th Grade, No Diploma", "Regular High School Diploma", "GED or Alternative Credential", "Some College, Less than 1 Year", "Some College, 1 or More Years, No Degree", "Associate's Degree", "Bachelor's Degree", "Master's Degree", "Professional School Degree", "Doctorate Degree"), prob = c(0.013, 0.05, 0.085, 0.246, 0.039, 0.064, 0.15, 0.075, 0.176, 0.072, 0.019, 0.012), name = "Education" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The educational attainments and probabilities used match approximate U.S. educational attainment make-up (http://www.census.gov):
Highest Attainment | Percent |
No Schooling Completed | 1.3 % |
Nursery School to 8th Grade | 5 % |
9th Grade to 12th Grade, No Diploma | 8.5 % |
Regular High School Diploma | 24.6 % |
GED or Alternative Credential | 3.9 % |
Some College, Less than 1 Year | 6.4 % |
Some College, 1 or More Years, No Degree | 15 % |
Associate's Degree | 7.5 % |
Bachelor's Degree | 17.6 % |
Master's Degree | 7.2 % |
Professional School Degree | 1.9 % |
Doctorate Degree | 1.2 % |
Returns a random vector of educational attainment level elements.
http://www.census.gov
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
education(10) pie(table(education(10000)))
education(10) pie(table(education(10000)))
Generate a random vector of employment statuses.
employment( n, x = c("Full Time", "Part Time", "Unemployed", "Retired", "Student"), prob = c(0.6, 0.1, 0.1, 0.1, 0.1), name = "Employment" )
employment( n, x = c("Full Time", "Part Time", "Unemployed", "Retired", "Student"), prob = c(0.6, 0.1, 0.1, 0.1, 0.1), name = "Employment" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The following arbitrary probabilities are used:
Employment Status | Percent |
Full Time | 60% |
Part Time | 10% |
Unemployed | 10% |
Retired | 10% |
Student | 10% |
Returns a random vector of employment status elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
employment(10) pie(table(employment(10000))) barplot(table(employment(10000)))
employment(10) pie(table(employment(10000))) barplot(table(employment(10000)))
Generate a random vector of eye colors.
eye( n, x = c("Brown", "Blue", "Green", "Hazel", "Gray"), prob = c(0.44, 0.3, 0.13, 0.09, 0.04), name = "Eye" )
eye( n, x = c("Brown", "Blue", "Green", "Hazel", "Gray"), prob = c(0.44, 0.3, 0.13, 0.09, 0.04), name = "Eye" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The eye colors and probabilities:
Color | Percent |
Brown | 44 % |
Blue | 30 % |
Green | 13 % |
Hazel | 9 % |
Gray | 4 % |
Returns a random vector of eye color elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
eye(10) barplot(v <- table(eye(10000)), col = replace(names(v), 4, "yellowgreen"))
eye(10) barplot(v <- table(eye(10000)), col = replace(names(v), 4, "yellowgreen"))
grade
- Generate a random normal vector of percent grades.
grade
- Generate a random normal vector of letter grades.
grade
- Generate a random normal vector of grade point averages (GPA;
0.0 - 4.0 scale).
grade(n, mean = 88, sd = 4, name = "Grade", digits = 1) grade_letter(n, mean = 88, sd = 4, name = "Grade_Letter") gpa(n, mean = 88, sd = 4, name = "GPA")
grade(n, mean = 88, sd = 4, name = "Grade", digits = 1) grade_letter(n, mean = 88, sd = 4, name = "Grade_Letter") gpa(n, mean = 88, sd = 4, name = "GPA")
n |
The number elements to generate. This can be globally set within
the environment of |
mean |
The mean value for the normal distribution to be drawn from. |
sd |
The standard deviation of the normal distribution to draw from. |
name |
The name to assign to the output vector's |
digits |
Integer indicating the number of decimal places to be used.
Negative values are allowed (see |
The conversion between percent range, letter grade, and GPA is:
Percent | Letter | GPA |
97-100 | A+ | 4.00 |
93-96 | A | 4.00 |
90-92 | A- | 3.67 |
87-89 | B+ | 3.33 |
83-86 | B | 3.00 |
80-82 | B- | 2.67 |
77-79 | C+ | 2.33 |
73-76 | C | 2.00 |
70-72 | C- | 1.67 |
67-69 | D+ | 1.33 |
63-66 | D | 1.00 |
60-62 | D- | 0.67 |
< 60 | F | 0.00 |
Returns a random normal vector of grade elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
grade(10) hist(grade(10000)) interval(grade, 5, n = 1000) grade_letter(10) barplot(table(grade_letter(10000))) gpa(10) hist(gpa(10000))
grade(10) hist(grade(10000)) interval(grade, 5, n = 1000) grade_letter(10) barplot(table(grade_letter(10000))) gpa(10) hist(gpa(10000))
Generate a random vector of grade levels.
grade_level( n, x = c("K", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), prob = NULL, name = "Grade_Level" )
grade_level( n, x = c("K", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), prob = NULL, name = "Grade_Level" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of grade level elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
grade_level(10) barplot(table(grade_level(10000)))
grade_level(10) barplot(table(grade_level(10000)))
A dataset containing a vector of Grady Ward's English words augmented with
qdapDictionaries's DICTIONARY
, Mark Kantrowitz's names list,
other proper nouns, and contractions.
data(grady_augmented)
data(grady_augmented)
A character vector with 122806 elements
A dataset containing a vector of Grady Ward's English words augmented with proper nouns (U.S. States, Countries, Mark Kantrowitz's Names List, and months) and contractions. That dataset is augmented to increase the data set size.
Moby Thesaurus List by Grady Ward https://www.gutenberg.org
List of names from Mark Kantrowitz http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/.
A copy of the http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/readme.txt
per the author's request.
Generate a random vector of binary groups (e.g., control/treatment).
group(n, x = c("Control", "Treatment"), prob = NULL, name = "Group")
group(n, x = c("Control", "Treatment"), prob = NULL, name = "Group")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of groups to sample from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random factor vector of group (control/treatment) elements.
If you want > 2 groups see 'r_sample_factor'.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
group(10) 100*table(group(n <- 10000))/n 100*table(group(n <- 10000, prob = c(.3, .7)))/n
group(10) 100*table(group(n <- 10000))/n 100*table(group(n <- 10000, prob = c(.3, .7)))/n
Generate a random vector of hair colors.
hair( n, x = c("Brown", "Black", "Blonde", "Red"), prob = c(0.35, 0.28, 0.26, 0.11), name = "Hair" )
hair( n, x = c("Brown", "Black", "Blonde", "Red"), prob = c(0.35, 0.28, 0.26, 0.11), name = "Hair" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The hair colors and probabilities:
Color | Percent |
Brown | 35 % |
Black | 28 % |
Blonde | 26 % |
Red | 11 % |
Returns a random vector of hair color elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
hair(10) v <- table(hair(10000)) lbs <- paste0(names(v), "\n", round(100*v/sum(v), 1), "%") pie(v, col = replace(names(v), 3, "yellow"), labels = lbs)
hair(10) v <- table(hair(10000)) lbs <- paste0(names(v), "\n", round(100*v/sum(v), 1), "%") pie(v, col = replace(names(v), 3, "yellow"), labels = lbs)
height
and height_in
- Generate a random normal vector of
heights in inches.
height_cm
- Generate a random normal vector of heights in centimeters.
height( n, mean = 69, sd = 3.75, min = 1, max = NULL, digits = 0, name = "Height" ) height_in( n, mean = 69, sd = 3.75, min = 1, max = NULL, digits = 1, name = "Height(in)" ) height_cm( n, mean = 175.26, sd = 9.525, min = 1, max = NULL, digits = 1, name = "Height(cm)" )
height( n, mean = 69, sd = 3.75, min = 1, max = NULL, digits = 0, name = "Height" ) height_in( n, mean = 69, sd = 3.75, min = 1, max = NULL, digits = 1, name = "Height(in)" ) height_cm( n, mean = 175.26, sd = 9.525, min = 1, max = NULL, digits = 1, name = "Height(cm)" )
n |
The number elements to generate. This can be globally set within
the environment of |
mean |
The mean value for the normal distribution to be drawn from. |
sd |
The standard deviation of the normal distribution to draw from. |
min |
A numeric lower boundary cutoff. Results less than this value will be
replaced with |
max |
A numeric upper boundary cutoff. Results greater than this value will
be replaced with |
digits |
Integer indicating the number of decimal places to be used.
Negative values are allowed (see |
name |
The name to assign to the output vector's |
Returns a random normal vector of height elements.
height
rounds to nearest whole number. height_in
&
height_in
round to the nearest tenths.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
height(10) hist(height(10000)) interval(height, 5, n = 1000)
height(10) hist(height(10000)) interval(height, 5, n = 1000)
Generate a random vector of H:M:S times.
hour(n, x = seq(0, 23.5, by = 0.5), prob = NULL, random = FALSE, name = "Hour")
hour(n, x = seq(0, 23.5, by = 0.5), prob = NULL, random = FALSE, name = "Hour")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
random |
logical. If |
name |
The name to assign to the output vector's |
Returns a random vector of H:M:S time elements.
hour(20) hour(20, random=TRUE)
hour(20) hour(20, random=TRUE)
id
- Generate a sequential character
vector of
zero-padded identification numbers (IDs).
id_factor
- Generate a sequential factor
vector
of zero-padded identification numbers (IDs).
id(n, random = FALSE, name = "ID") id_factor(n, random = FALSE, name = "ID")
id(n, random = FALSE, name = "ID") id_factor(n, random = FALSE, name = "ID")
n |
The number elements to generate. This can be globally set within
the environment of |
random |
logical. If |
name |
The name to assign to the output vector's |
Returns a (optionally random) vector of
character
/factor
observations
ID numbers.
id
uses sprintf
to generate the
padded ID. Per sprintf
's documentation: “The format
string is passed down the OS's sprintf function...The behaviour on inputs not
documented here is 'undefined', which means it is allowed to differ by
platform.” See sprintf
for details.
id
is faster than id_factor
, as the later coerces the
vector to a factor
.
id(1000) r_data_frame(n=21, id)
id(1000) r_data_frame(n=21, id)
Generate a random gamma vector of incomes.
income(n, digits = 2, name = "Income")
income(n, digits = 2, name = "Income")
n |
The number elements to generate. This can be globally set within
the environment of |
digits |
Integer indicating the number of decimal places to be used.
Negative values are allowed (see |
name |
The name to assign to the output vector's |
Incomes are generated using: rgamma(n, 2) * 2000
.
Returns a random gamma vector of income elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
income(10) hist(income(10000)) pie(table(cut(income(10000), 10)))
income(10) hist(income(10000)) pie(table(cut(income(10000), 10)))
Generate a random vector of Internet browser.
internet_browser( n, x = c("Chrome", "IE", "Firefox", "Safari", "Opera", "Android"), prob = c(0.5027, 0.175, 0.1689, 0.0994, 0.017, 0.0132), name = "Browser" )
internet_browser( n, x = c("Chrome", "IE", "Firefox", "Safari", "Opera", "Android"), prob = c(0.5027, 0.175, 0.1689, 0.0994, 0.017, 0.0132), name = "Browser" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The browser use and probabilities (from https://gs.statcounter.com/):
Browser | Percent |
Chrome | 50.27 % |
IE | 17.50 % |
Firefox | 16.89 % |
Safari | 9.94 % |
Opera | 1.70 % |
Android | 1.32 % |
Returns a random factor vector of Internet browser elements.
https://gs.statcounter.com/
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
internet_browser(20) barplot(table(internet_browser(10000))) pie(table(internet_browser(10000)))
internet_browser(20) barplot(table(internet_browser(10000))) pie(table(internet_browser(10000)))
A wrapper for cut
that cuts the vector and then adds the
varname
produced by the original function.
interval( fun, breaks, ..., labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, n )
interval( fun, breaks, ..., labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, n )
fun |
A vector producing function. |
breaks |
Either a numeric vector of two or more unique cut points or a
single number (greater than or equal to 2) giving the number of intervals
into which the vector produced from |
labels |
Labels for the levels of the resulting category. By default,
labels are constructed using "(a,b]" interval notation. If
|
include.lowest |
logical. If |
right |
logical. If |
dig.lab |
An integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers. |
ordered_result |
logical. If |
n |
The number elements to generate. This can be globally set within
the environment of |
... |
Other arguments passed to |
Returns a cut
factor vector.
interval(normal, 4, n=100) attributes(interval(normal, 4, n=100)) interval(age, 3, n = 1000)
interval(normal, 4, n=100) attributes(interval(normal, 4, n=100)) interval(age, 3, n = 1000)
Generate a random normal vector of intelligence quotients (IQs).
iq(n, mean = 100, sd = 10, min = 0, max = NULL, digits = 0, name = "IQ")
iq(n, mean = 100, sd = 10, min = 0, max = NULL, digits = 0, name = "IQ")
n |
The number elements to generate. This can be globally set within
the environment of |
mean |
The mean value for the normal distribution to be drawn from. |
sd |
The standard deviation of the normal distribution to draw from. |
min |
A numeric lower boundary cutoff. Results less than this value will be
replaced with |
max |
A numeric upper boundary cutoff. Results greater than this value will
be replaced with |
digits |
Integer indicating the number of decimal places to be used.
Negative values are allowed (see |
name |
The name to assign to the output vector's |
Returns a random normal vector of IQ elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
iq(10) hist(iq(10000)) interval(iq, 5, n = 1000)
iq(10) hist(iq(10000)) interval(iq, 5, n = 1000)
Generate a random vector of languages from the
presidential_debates_2012
.
language( n, x = wakefield::languages[["Language"]], prob = wakefield::languages[["Proportion"]], name = "Language" )
language( n, x = wakefield::languages[["Language"]], prob = wakefield::languages[["Proportion"]], name = "Language" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random character vector of language elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
language(10) pie(table(language(10000))) lang <- wakefield::languages[sample(1:99, 6), ] lang["prop"] <- lang[["N"]]/sum(lang[["N"]]) labs <- round(100 * lang[["prop"]], 1) pie(lang[["prop"]], paste0(lang[["Language"]], "\n", labs, "%"))
language(10) pie(table(language(10000))) lang <- wakefield::languages[sample(1:99, 6), ] lang["prop"] <- lang[["N"]]/sum(lang[["N"]]) labs <- round(100 * lang[["prop"]], 1) pie(lang[["prop"]], paste0(lang[["Language"]], "\n", labs, "%"))
A dataset containing native language use statistics taken from: https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers.
data(languages)
data(languages)
A data frame with 99 rows and 4 variables
Language. The language spoken.
N. The number of speakers world-wide.
Proportion. The proportion of speakers.
Percent. The percentage of speakers.
https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
level
- Generate a random vector of integer levels (1-4).
math
- Generate a random vector of integer mathematics levels (1-4)
similar to New York State grades 3-8 assessment results.
ela
- Generate a random vector of integer English language arts (ELA)
levels (1-4) similar to New York State grades 3-8 assessment results.
level(n, x = 1:4, prob = NULL, name = "Level") math(n, x = 1:4, prob = c(0.29829, 0.33332, 0.22797, 0.14042), name = "Math") ela(n, x = 1:4, prob = c(0.3161, 0.37257, 0.2233, 0.08803), name = "ELA")
level(n, x = 1:4, prob = NULL, name = "Level") math(n, x = 1:4, prob = c(0.29829, 0.33332, 0.22797, 0.14042), name = "Math") ela(n, x = 1:4, prob = c(0.3161, 0.37257, 0.2233, 0.08803), name = "ELA")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Distribution of levels (used in prob
) were taken from New
York State' s 2014 assessment report: http://www.p12.nysed.gov/irs/
Level | ELA | Math |
1 | 31.6% | 29.8% |
2 | 37.3% | 33.3% |
3 | 22.3% | 22.8% |
4 | 8.8% | 14.0% |
Returns a random vector of integer levels (1-4) elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
level(10) barplot(table(level(10000, prob = probs(4)))) math(10) barplot(table(math(10000))) ela(10) barplot(table(ela(10000)))
level(10) barplot(table(level(10000, prob = probs(4)))) math(10) barplot(table(math(10000))) ela(10) barplot(table(ela(10000)))
Generate a random vector of Likert-type responses.
likert( n, x = c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree"), prob = NULL, name = "Likert" ) likert_5( n, x = c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree"), prob = NULL, name = "Likert" ) likert_7( n, x = c("Strongly Agree", "Agree", "Somewhat Agree", "Neutral", "Somewhat Disagree", "Disagree", "Strongly Disagree"), prob = NULL, name = "Likert" )
likert( n, x = c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree"), prob = NULL, name = "Likert" ) likert_5( n, x = c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree"), prob = NULL, name = "Likert" ) likert_7( n, x = c("Strongly Agree", "Agree", "Somewhat Agree", "Neutral", "Somewhat Disagree", "Disagree", "Strongly Disagree"), prob = NULL, name = "Likert" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of Likert-type response elements.
likert
& likert_5
are identical outputs, sampling from a
5-point response scale. likert_7
samples from a 7-point response
scale.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
dice(10) barplot(table(dice(10000)))
dice(10) barplot(table(dice(10000)))
Generates (pseudo)random lorem ipsum text.
lorem_ipsum(n, ..., name = "Lorem_Ipsum") paragraph(n, ..., name = "Paragraph")
lorem_ipsum(n, ..., name = "Lorem_Ipsum") paragraph(n, ..., name = "Paragraph")
n |
The number elements to generate. This can be globally set within
the environment of |
... |
Other arguments passed to |
name |
The name to assign to the output vector's |
Returns a random character vector of string elements.
lorem_ipsum
and paragraph
produce identical strings but
will produce different vector/column names when used inside of
r_data_frame
or r_list
.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
lorem_ipsum(10) paragraph(10) lorem_ipsum(10, start_lipsum = FALSE)
lorem_ipsum(10) paragraph(10) lorem_ipsum(10, start_lipsum = FALSE)
Generate a random vector of marital statuses.
marital( n, x = c("Married", "Divorced", "Widowed", "Separated", "Never Married"), prob = NULL, name = "Marital" )
marital( n, x = c("Married", "Divorced", "Widowed", "Separated", "Never Married"), prob = NULL, name = "Marital" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of marital status elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
marital(10) barplot(table(marital(10000)))
marital(10) barplot(table(marital(10000)))
Generate a random vector of military branches.
military( n, x = c("Army", "Air Force", "Navy", "Marine Corps", "Coast Guard"), prob = c(0.3785, 0.2334, 0.2218, 0.1366, 0.0296), name = "Military" )
military( n, x = c("Army", "Air Force", "Navy", "Marine Corps", "Coast Guard"), prob = c(0.3785, 0.2334, 0.2218, 0.1366, 0.0296), name = "Military" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The military branches and probabilities used match approximate U.S. military make-up:
Branch | N | Percent |
Army | 541,291 | 37.9% |
Air Force | 333,772 | 23.3% |
Navy | 317,237 | 22.2% |
Marine Corps | 195,338 | 13.7% |
Coast Guard | 42,357 | 3.0% |
Returns a random factor vector of military branch elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
military(10) barplot(table(military(10000))) pie(table(military(10000)))
military(10) barplot(table(military(10000))) pie(table(military(10000)))
Generate a random vector of minutes in H:M:S format.
minute( n, x = seq(0, 59, by = 1)/60, prob = NULL, random = FALSE, name = "Minute" )
minute( n, x = seq(0, 59, by = 1)/60, prob = NULL, random = FALSE, name = "Minute" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
random |
logical. If |
name |
The name to assign to the output vector's |
Returns a random vector of minute time elements in H:M:S format.
minute(20) minute(20, random=TRUE) pie(table(minute(2000, x = seq(0, 59, by = 10)/60, prob = probs(6))))
minute(20) minute(20, random=TRUE) pie(table(minute(2000, x = seq(0, 59, by = 10)/60, prob = probs(6))))
Generate a random factor vector of months.
month(n, x = month.name, prob = NULL, name = "Month")
month(n, x = month.name, prob = NULL, name = "Month")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random character vector of month elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
month(10) pie(table(month(10000, prob = probs(12))))
month(10) pie(table(month(10000, prob = probs(12))))
Generate a random vector of first names. This dataset includes all unique entries
from the babynames
package.
name( n, x = wakefield::name_neutral, prob = NULL, replace = FALSE, name = "Name" )
name( n, x = wakefield::name_neutral, prob = NULL, replace = FALSE, name = "Name" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
replace |
logical. If |
name |
The name to assign to the output vector's |
Returns a random vector of name elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
name(10) name(100) name(1000, replace = TRUE)
name(10) name(100) name(1000, replace = TRUE)
A dataset containing a character vector gender neutral names according to the U.S. Census.
data(name_neutral)
data(name_neutral)
A character vector with 662 elements
http://www.census.gov
normal
- A wrapper for rnorm
that generate a
random normal vector.
normal_round
- A wrapper for rnorm
that generate
a rounded random normal vector.
normal(n, mean = 0, sd = 1, min = NULL, max = NULL, name = "Normal") normal_round( n, mean = 0, sd = 1, min = NULL, max = NULL, digits = 2, name = "Normal" )
normal(n, mean = 0, sd = 1, min = NULL, max = NULL, name = "Normal") normal_round( n, mean = 0, sd = 1, min = NULL, max = NULL, digits = 2, name = "Normal" )
n |
The number elements to generate. This can be globally set within
the environment of |
mean |
The mean value for the normal distribution to be drawn from. |
sd |
The standard deviation of the normal distribution to draw from. |
min |
A numeric lower boundary cutoff. Results less than this value will be
replaced with |
max |
A numeric upper boundary cutoff. Results greater than this value will
be replaced with |
name |
The name to assign to the output vector's |
digits |
Integer indicating the number of decimal places to be used.
Negative values are allowed (see |
Returns a random vector of elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
normal(100, name = "Var") hist(normal(10000, 100, 10)) interval(normal, 9, n = 1000)
normal(100, name = "Var") hist(normal(10000, 100, 10)) interval(normal, 9, n = 1000)
Convenience function to view all the columns of the head
of a truncated data.frame
. peek
invisibly returns
x
. This makes its use ideal in a dplyr/magrittr pipeline.
peek(x, n = 10, width = 10, ...)
peek(x, n = 10, width = 10, ...)
x |
A |
n |
Number of rows to display. |
width |
The width of the columns to be displayed. |
... |
For internal use. |
By default dplyr does not print all columns of a data frame
(tbl_df
). This makes inspection of data difficult at times,
particularly with text string data. peek
allows the user to see a
truncated head for inspection purposes.
Prints a truncated head but invisibly returns x
.
(dat1 <- r_data_frame(100, id, sentence, paragraph)) peek(dat1) peek(dat1, n = 20) peek(dat1, width = 40) library(dplyr) ## Use in a dplyr/magrittr pipeline to view the data (silly example) par(mfrow = c(2, 2)) r_data_frame(1000, id, sex, pet, employment, eye, sentence, paragraph) %>% peek %>% (function(x, ind = 2:5){ invisible(lapply(ind, function(i) pie(table(x[[i]]))))}) ## A wider data set example dat2 <- r_data_theme() dat2 peek(dat2)
(dat1 <- r_data_frame(100, id, sentence, paragraph)) peek(dat1) peek(dat1, n = 20) peek(dat1, width = 40) library(dplyr) ## Use in a dplyr/magrittr pipeline to view the data (silly example) par(mfrow = c(2, 2)) r_data_frame(1000, id, sex, pet, employment, eye, sentence, paragraph) %>% peek %>% (function(x, ind = 2:5){ invisible(lapply(ind, function(i) pie(table(x[[i]]))))}) ## A wider data set example dat2 <- r_data_theme() dat2 peek(dat2)
Plots a tbl_df object.
## S3 method for class 'tbl_df' plot(x, ...)
## S3 method for class 'tbl_df' plot(x, ...)
x |
The tbl_df object. |
... |
Arguments passed to |
Generate a random vector of political parties.
political( n, x = c("Democrat", "Republican", "Constitution", "Libertarian", "Green"), prob = c(0.577269133302094, 0.410800432748879, 0.00491084954793489, 0.00372590303330866, 0.0032936813677832), name = "Political" )
political( n, x = c("Democrat", "Republican", "Constitution", "Libertarian", "Green"), prob = c(0.577269133302094, 0.410800432748879, 0.00491084954793489, 0.00372590303330866, 0.0032936813677832), name = "Political" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The political parties and probabilities used match approximate U.S. political make-up of registered voters (2014). The default make up is:
Party | N | Percent |
Democrat | 43,140,758 | 57.73% |
Republican | 30,700,138 | 41.08% |
Constitution | 367,000 | .49% |
Libertarian | 278,446 | .37% |
Green | 246,145 | .33% |
Returns a random factor vector of political party elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
political(10) barplot(table(political(10000)))
political(10) barplot(table(political(10000)))
A dataset containing 2911 ordered sentences used by speakers during the three 2012 presidential debates.
data(presidential_debates_2012)
data(presidential_debates_2012)
A character vector with 2911 elements
Prints an available object.
## S3 method for class 'available' print(x, ...)
## S3 method for class 'available' print(x, ...)
x |
The available object |
... |
ignored |
Prints a variable
object
## S3 method for class 'variable' print(x, ...)
## S3 method for class 'variable' print(x, ...)
x |
The |
... |
Ignored. |
Generate a random vector of probabilities that sum to 1.
probs(j, upper = 1e+06)
probs(j, upper = 1e+06)
j |
An integer of number of probability elements (typically performs best at j < 4000). |
upper |
|
Returns a vector of probabilities summing to 1.
probs(10) sum(probs(100)) pie(table(month(10000, prob = probs(12))))
probs(10) sum(probs(100)) pie(table(month(10000, prob = probs(12))))
r_data
- Generate a data set with pre-set columns selected.
r_data_theme
- Generate a themed data set with pre-set columns.
r_data(n = 500, ...) r_data_theme(n = 100, data_theme = "the_works")
r_data(n = 500, ...) r_data_theme(n = 100, data_theme = "the_works")
n |
The length to pass to the randomly generated vectors (number of rows). |
data_theme |
A data theme. Currently selections include:
|
... |
A set of optionally named arguments. Using wakefield variable functions require no name or call parenthesis. |
The pre-selected columns include:
ID
Race
Age
Sex
Hour
IQ
Height
Died
The user may use ... to add additional columns. r_data
is a
convenience function to quickly produce a data set. For more specific usage
use the more flexible r_data_frame
function.
Returns a tbl_df
.
r_data() r_data(10) r_data(10, paragraph, Attending = valid) peek(r_data_theme()) plot(r_data_theme(), flip=TRUE) r_data_theme(, "survey") r_data_theme(, "survey2")
r_data() r_data(10) r_data(10, paragraph, Attending = valid) peek(r_data_theme()) plot(r_data_theme(), flip=TRUE) r_data_theme(, "survey") r_data_theme(, "survey2")
Produce a tbl_df
data frame that allows the user to
lazily pass unnamed wakefield variable functions (optionally, without
call parenthesis).
r_data_frame(n, ..., rep.sep = "_")
r_data_frame(n, ..., rep.sep = "_")
n |
The length to pass to the randomly generated vectors. |
rep.sep |
A separator to use for repeated variable names. For example
if the |
... |
A set of optionally named arguments. Using wakefield variable functions require no name or call parenthesis. |
Returns a tbl_df
.
Josh O'Brien and Tyler Rinker <[email protected]>.
https://stackoverflow.com/a/29617983/1000343
r_data_frame(n = 30, id, race, age, sex, hour, iq, height, died, Scoring = rnorm, Smoker = valid ) r_data_frame(n = 30, id, race, age(x = 8:14), Gender = sex, Time = hour, iq, grade, grade, grade, #repeated measures height(mean=50, sd = 10), died, Scoring = rnorm, Smoker = valid ) r_data_frame(n = 500, id, age, age, age, grade, grade, grade ) ## Repeated Measures/Time Series r_data_frame(n=100, id, age, sex, r_series(likert, 3), r_series(likert, 4, name = "Item", integer = TRUE) ) ## Expanded Dummy Coded Variables r_data_frame(n=100, id, age, r_dummy(sex, prefix=TRUE), r_dummy(political) ) ## `peek` to view al columns ## `plot` (`table_heat`) for a graphic representation library(dplyr) r_data_frame(n=100, id, dob, animal, grade, grade, death, dummy, grade_letter, gender, paragraph, sentence ) %>% r_na() %>% peek %>% plot(palette = "Set1")
r_data_frame(n = 30, id, race, age, sex, hour, iq, height, died, Scoring = rnorm, Smoker = valid ) r_data_frame(n = 30, id, race, age(x = 8:14), Gender = sex, Time = hour, iq, grade, grade, grade, #repeated measures height(mean=50, sd = 10), died, Scoring = rnorm, Smoker = valid ) r_data_frame(n = 500, id, age, age, age, grade, grade, grade ) ## Repeated Measures/Time Series r_data_frame(n=100, id, age, sex, r_series(likert, 3), r_series(likert, 4, name = "Item", integer = TRUE) ) ## Expanded Dummy Coded Variables r_data_frame(n=100, id, age, r_dummy(sex, prefix=TRUE), r_dummy(political) ) ## `peek` to view al columns ## `plot` (`table_heat`) for a graphic representation library(dplyr) r_data_frame(n=100, id, dob, animal, grade, grade, death, dummy, grade_letter, gender, paragraph, sentence ) %>% r_na() %>% peek %>% plot(palette = "Set1")
Generate random values from a wakefield variable function.
r_dummy(fun, n, ..., prefix = FALSE, rep.sep = "_")
r_dummy(fun, n, ..., prefix = FALSE, rep.sep = "_")
fun |
A wakefield variable function. |
n |
The number of rows to produce. |
prefix |
logical. If |
rep.sep |
A separator to use for the variable and category part of names
when |
... |
Additional arguments passed to |
Returns a tbl_df
.
r_list
,
r_data_frame
,
r_series
r_dummy(sex, 10) r_dummy(race, 1000) r_dummy(race, 1000, name = "Ethnicity")
r_dummy(sex, 10) r_dummy(race, 1000) r_dummy(race, 1000, name = "Ethnicity")
r_data_frame
Safely insert data.frame
objects into a
r_data_frame
or r_list
.
r_insert(x, name = "Inserted")
r_insert(x, name = "Inserted")
x |
A |
name |
A name to assign to |
Returns a data.frame
with a
attributes(x)[["seriesname"]]
assigned.
dat <- dplyr::data_frame( Age_1 = age(100), Age_2 = age(100), Age_3 = age(100), Smokes = smokes(n=100), Sick = ifelse(Smokes, sample(5:10, 100, TRUE), sample(0:4, 100, TRUE)), Death = ifelse(Smokes, sample(0:1, 100, TRUE, prob = c(.2, .8)), sample(0:1, 100, TRUE, prob = c(.7, .3))) ) r_data_frame(100, id, r_insert(dat) ) r_list(10, id, r_insert(dat) )
dat <- dplyr::data_frame( Age_1 = age(100), Age_2 = age(100), Age_3 = age(100), Smokes = smokes(n=100), Sick = ifelse(Smokes, sample(5:10, 100, TRUE), sample(0:4, 100, TRUE)), Death = ifelse(Smokes, sample(0:1, 100, TRUE, prob = c(.2, .8)), sample(0:1, 100, TRUE, prob = c(.7, .3))) ) r_data_frame(100, id, r_insert(dat) ) r_list(10, id, r_insert(dat) )
Produce a named list
that allows the user to lazily pass
unnamed wakefield variable functions (optionally, without call
parenthesis).
r_list(n, ..., rep.sep = "_")
r_list(n, ..., rep.sep = "_")
n |
The length to pass to the randomly generated vectors. |
rep.sep |
A separator to use for repeated variable names. For example
if the |
... |
A set of optionally named arguments. Using wakefield variable functions require no name or call parenthesis. |
Returns a named list of equal length vectors.
Josh O'Brien and Tyler Rinker <[email protected]>.
https://stackoverflow.com/a/29617983/1000343
r_data_frame
,
r_series
r_dummy
r_list( n = 30, id, race, age, sex, hour, iq, height, died, Scoring = rnorm ) r_list( n = 30, id, race, age(x = 8:14), Gender = sex, Time = hour, iq, height(mean=50, sd = 10), died, Scoring = rnorm )
r_list( n = 30, id, race, age, sex, hour, iq, height, died, Scoring = rnorm ) r_list( n = 30, id, race, age(x = 8:14), Gender = sex, Time = hour, iq, height(mean=50, sd = 10), died, Scoring = rnorm )
Replaces a proportion of values with NA. Useful for simulating missing data.
r_na(x, cols = -1, prob = 0.05)
r_na(x, cols = -1, prob = 0.05)
x |
A |
cols |
Numeric indices of the columns to incude (use |
prob |
The proportion of each column/vector elements to assign to
|
Returns a data.frame
or list
with random missing values.
r_na(mtcars) r_na(mtcars, NULL) library(dplyr) r_data_frame( n = 30, id, race, age, sex, hour, iq, height, died, Scoring = rnorm, Smoker = valid ) %>% r_na(prob=.4)
r_na(mtcars) r_na(mtcars, NULL) library(dplyr) r_data_frame( n = 30, id, race, age, sex, hour, iq, height, died, Scoring = rnorm, Smoker = valid ) %>% r_na(prob=.4)
Generate a random vector.
r_sample(n, x = 1:100, prob = NULL, name = "Sample")
r_sample(n, x = 1:100, prob = NULL, name = "Sample")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of elements.
r_sample(100, name = "Var") table(r_sample(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample(x = c("B", "W"), prob = c(.7, .3), n = 25, name = "Race") r_sample(25, x = c(TRUE, FALSE))
r_sample(100, name = "Var") table(r_sample(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample(x = c("B", "W"), prob = c(.7, .3), n = 25, name = "Race") r_sample(25, x = c(TRUE, FALSE))
r_sample_binary
- Generate a random binary vector.
r_sample_binary_factor
- Generate a random binary vector and coerces
to a factor.
r_sample_binary(n, x = 1:2, prob = NULL, name = "Binary") r_sample_binary_factor(n, x = 1:2, prob = NULL, name = "Binary")
r_sample_binary(n, x = 1:2, prob = NULL, name = "Binary") r_sample_binary_factor(n, x = 1:2, prob = NULL, name = "Binary")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of length 2 to sample from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random binary vector of elements.
r_sample_binary(100, name = "Var") table(r_sample_binary(1000)) c("B", "W")[r_sample_binary(10)]
r_sample_binary(100, name = "Var") table(r_sample_binary(1000)) c("B", "W")[r_sample_binary(10)]
Generate a random vector and coerces to a factor.
r_sample_factor(n, x = LETTERS, prob = NULL, name = "Factor")
r_sample_factor(n, x = LETTERS, prob = NULL, name = "Factor")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random actor vector of elements.
r_sample_factor(100, name = "Var") table(r_sample_factor(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample_factor(x = c("B", "W"), prob = c(.7, .3), n = 25)
r_sample_factor(100, name = "Var") table(r_sample_factor(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample_factor(x = c("B", "W"), prob = c(.7, .3), n = 25)
Generate a random integer vector.
r_sample_integer(n, x = 1:100, prob = NULL, name = "Integer")
r_sample_integer(n, x = 1:100, prob = NULL, name = "Integer")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random integer vector of elements.
r_sample_integer(100, name = "Var") table(r_sample_integer(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample_integer(x = c("B", "W"), prob = c(.7, .3), n = 25, name = "Race") r_sample_integer(25, x = c(TRUE, FALSE))
r_sample_integer(100, name = "Var") table(r_sample_integer(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample_integer(x = c("B", "W"), prob = c(.7, .3), n = 25, name = "Race") r_sample_integer(25, x = c(TRUE, FALSE))
Generate a random logical (TRUE
/FALSE
) vector.
r_sample_logical(n, prob = NULL, name = "Logical")
r_sample_logical(n, prob = NULL, name = "Logical")
n |
The number elements to generate. This can be globally set within
the environment of |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random logical (TRUE
/FALSE
) vector of elements.
r_sample_logical(100, name = "Var") table(r_sample_logical(1000)) c("B", "W")[r_sample_logical(10)]
r_sample_logical(100, name = "Var") table(r_sample_logical(1000)) c("B", "W")[r_sample_logical(10)]
Generate a random vector and coerces to an ordered factor.
r_sample_ordered(n, x = LETTERS[1:5], prob = NULL, name = "Ordered")
r_sample_ordered(n, x = LETTERS[1:5], prob = NULL, name = "Ordered")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random factor vector of elements.
r_sample_ordered(100, name = "Var") lvls <- c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree") table(r_sample_ordered(x = lvls, n=1000)) (out <- r_sample_ordered(x = c("Black", "Grey", "White"), prob = c(.5, .2, .3), n = 100)) slices <- c(table(out)) pie(slices, main="Pie Chart of Colors", col = tolower(names(slices)))
r_sample_ordered(100, name = "Var") lvls <- c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree") table(r_sample_ordered(x = lvls, n=1000)) (out <- r_sample_ordered(x = c("Black", "Grey", "White"), prob = c(.5, .2, .3), n = 100)) slices <- c(table(out)) pie(slices, main="Pie Chart of Colors", col = tolower(names(slices)))
Generate a random vector without replacement.
r_sample_replace(n, x = 1:100, prob = NULL, replace = FALSE, name = "Sample")
r_sample_replace(n, x = 1:100, prob = NULL, replace = FALSE, name = "Sample")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
replace |
logical. If |
name |
The name to assign to the output vector's |
Returns a random vector of elements.
r_sample(100, name = "Var") table(r_sample(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample(x = c("B", "W"), prob = c(.7, .3), n = 25, name = "Race") r_sample(25, x = c(TRUE, FALSE))
r_sample(100, name = "Var") table(r_sample(x = c("Dog", "Cat", "Fish", "Bird"), n=1000)) r_sample(x = c("B", "W"), prob = c(.7, .3), n = 25, name = "Race") r_sample(25, x = c(TRUE, FALSE))
Produce a tbl_df
data frame of repeated measures from a
wakefield variable function.
r_series(fun, j, n, ..., integer = FALSE, relate = NULL, rep.sep = "_")
r_series(fun, j, n, ..., integer = FALSE, relate = NULL, rep.sep = "_")
fun |
A wakefield variable function. |
j |
The number of columns to produce. |
n |
The number of rows to produce. |
integer |
logical. If |
relate |
Allows the user to specify the relationship between columns.
May be a named list of |
rep.sep |
A separator to use for repeated variable names. For example
if the |
... |
Additional arguments passed to |
Returns a tbl_df
.
https://github.com/trinker/wakefield/issues/1/#issuecomment-96166910
r_series(grade, 5, 10) ## Custom name prefix r_series(likert, 5, 10, name = "Question") ## Convert factors to integers r_series(likert_7, 5, 10, integer = TRUE) ## Related variables r_series(likert, 10, 200, relate = list(operation = "*", mean = 2, sd = 1)) r_series(likert, 10, 200, relate = "--3_1") r_series(age, 10, 200, relate = "+5_0") ## Change sd to reduce/increase correlation round(cor(r_series(grade, 10, 10, relate = "+1_2")), 2) round(cor(r_series(grade, 10, 10, relate = "+1_0")), 2) round(cor(r_series(grade, 10, 10, relate = "+1_.5")), 2) round(cor(r_series(grade, 10, 10, relate = "+1_20")), 2) ## Plot Example 1 library(dplyr); library(ggplot2) dat <- r_data_frame(12, name, r_series(likert, 10, relate = "+1_.5") ) # Suggested use of tidyr or reshape2 package here instead dat <- data.frame( ID = rep(dat[[1]], ncol(dat[-1])), stack(dat[-1]) ) dat[["Time"]] <- factor(sub("Variable_", "", dat[["ind"]]), levels = 1:10) ggplot(dat, aes(x = Time, y = values, color = ID, group = ID)) + geom_line(size=.8) ## Plot Example 2 dat <- r_data_frame(12, name, r_series(grade, 100, relate = "+1_2") ) # Suggested use of tidyr or reshape2 package here instead dat <- data.frame( ID = rep(dat[[1]], ncol(dat[-1])), ind = rep(colnames(dat[-1]), each = nrow(dat)), values = unlist(dat[-1]) ) dat[["Time"]] <- as.numeric(sub("Grade_", "", dat[["ind"]])) ggplot(dat, aes(x = Time, y = values, color = ID, group = ID)) + geom_line(size=.8) + theme_bw()
r_series(grade, 5, 10) ## Custom name prefix r_series(likert, 5, 10, name = "Question") ## Convert factors to integers r_series(likert_7, 5, 10, integer = TRUE) ## Related variables r_series(likert, 10, 200, relate = list(operation = "*", mean = 2, sd = 1)) r_series(likert, 10, 200, relate = "--3_1") r_series(age, 10, 200, relate = "+5_0") ## Change sd to reduce/increase correlation round(cor(r_series(grade, 10, 10, relate = "+1_2")), 2) round(cor(r_series(grade, 10, 10, relate = "+1_0")), 2) round(cor(r_series(grade, 10, 10, relate = "+1_.5")), 2) round(cor(r_series(grade, 10, 10, relate = "+1_20")), 2) ## Plot Example 1 library(dplyr); library(ggplot2) dat <- r_data_frame(12, name, r_series(likert, 10, relate = "+1_.5") ) # Suggested use of tidyr or reshape2 package here instead dat <- data.frame( ID = rep(dat[[1]], ncol(dat[-1])), stack(dat[-1]) ) dat[["Time"]] <- factor(sub("Variable_", "", dat[["ind"]]), levels = 1:10) ggplot(dat, aes(x = Time, y = values, color = ID, group = ID)) + geom_line(size=.8) ## Plot Example 2 dat <- r_data_frame(12, name, r_series(grade, 100, relate = "+1_2") ) # Suggested use of tidyr or reshape2 package here instead dat <- data.frame( ID = rep(dat[[1]], ncol(dat[-1])), ind = rep(colnames(dat[-1]), each = nrow(dat)), values = unlist(dat[-1]) ) dat[["Time"]] <- as.numeric(sub("Grade_", "", dat[["ind"]])) ggplot(dat, aes(x = Time, y = values, color = ID, group = ID)) + geom_line(size=.8) + theme_bw()
Generate a random vector of races.
race( n, x = c("White", "Hispanic", "Black", "Asian", "Bi-Racial", "Native", "Other", "Hawaiian"), prob = c(0.637, 0.163, 0.122, 0.047, 0.019, 0.007, 0.002, 0.0015), name = "Race" )
race( n, x = c("White", "Hispanic", "Black", "Asian", "Bi-Racial", "Native", "Other", "Hawaiian"), prob = c(0.637, 0.163, 0.122, 0.047, 0.019, 0.007, 0.002, 0.0015), name = "Race" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The races and probabilities used match approximate U.S. racial make-up. The default make up is:
Race | Percent |
White | 63.70 % |
Hispanic | 16.30 % |
Black | 12.20 % |
Asian | 4.70 % |
Bi-Racial | 1.90 % |
Native | .70 % |
Other | .20 % |
Hawaiian | .15 % |
Returns a random factor vector of elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
race(10) 100*table(race(n <- 10000))/n
race(10) 100*table(race(n <- 10000))/n
Generate columns that are related.
relate( x, j, name = NULL, operation = "+", mean = 5, sd = 1, rep.sep = "_", digits = max(nchar(sub("^[^.]*.", "", x))) )
relate( x, j, name = NULL, operation = "+", mean = 5, sd = 1, rep.sep = "_", digits = max(nchar(sub("^[^.]*.", "", x))) )
x |
A starting column. |
j |
The number of columns to produce. |
name |
An optional prefix name to give to the columns. If |
operation |
A operation character vector of length 1; either
|
mean |
Mean is the average value to add, subtract, multiple, or divide by. |
sd |
The amount of variability to allow in |
rep.sep |
A separator to use for repeated variable names. For example
if the |
digits |
The number of digits to round to. Defaults to the max number
of significant digits in |
Returns a tbl_df
.
relate(1:10, 10) (x <- r_data_frame(10, id, relate(1:10, 10, "Time", mean = 2))) library(ggplot2) dat <- with(x, data.frame(ID = rep(ID, ncol(x[, -1])), stack(x[, -1]))) dat[["Time"]] <- factor(sub("Time_", "", dat[["ind"]]), levels = 1:10) ggplot(dat, aes(x = Time, y = values, color = ID, group = ID)) + geom_line(size=.8) relate(1:10, 10, name = "X", operation = "-") relate(1:10, 10, "X", mean = 1, sd = 0) relate(1:10, 10, "Var", "*") relate(1:10, 10, "Var", "/") relate(gpa(30), 5, mean = .1) relate(likert(10), 5, mean = .1, sd = .2) relate(date_stamp(10), 6) relate(time_stamp(10), 6) relate(rep(100, 10), 6, "Reaction", "-")
relate(1:10, 10) (x <- r_data_frame(10, id, relate(1:10, 10, "Time", mean = 2))) library(ggplot2) dat <- with(x, data.frame(ID = rep(ID, ncol(x[, -1])), stack(x[, -1]))) dat[["Time"]] <- factor(sub("Time_", "", dat[["ind"]]), levels = 1:10) ggplot(dat, aes(x = Time, y = values, color = ID, group = ID)) + geom_line(size=.8) relate(1:10, 10, name = "X", operation = "-") relate(1:10, 10, "X", mean = 1, sd = 0) relate(1:10, 10, "Var", "*") relate(1:10, 10, "Var", "/") relate(gpa(30), 5, mean = .1) relate(likert(10), 5, mean = .1, sd = .2) relate(date_stamp(10), 6) relate(time_stamp(10), 6) relate(rep(100, 10), 6, "Reaction", "-")
Generate a random vector of religion.
religion( n, x = c("Christian", "Muslim", "None", "Hindu", "Buddhist", "Folk", "Other", "Jewish"), prob = c(0.31477, 0.23163, 0.16323, 0.14985, 0.07083, 0.05882, 0.00859, 0.00227), name = "Religion" )
religion( n, x = c("Christian", "Muslim", "None", "Hindu", "Buddhist", "Folk", "Other", "Jewish"), prob = c(0.31477, 0.23163, 0.16323, 0.14985, 0.07083, 0.05882, 0.00859, 0.00227), name = "Religion" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The religion and probabilities used match approximate world religion make-up (from Pew Research Center). The default make up is:
Religion | N | Percent |
Christian | 2,173,260,000 | 31.48 % |
Muslim | 1,599,280,000 | 23.16 % |
None | 1,127,000,000 | 16.32 % |
Hindu | 1,034,620,000 | 14.99 % |
Buddhist | 489,030,000 | 7.08 % |
Folk | 406,140,000 | 5.88 % |
Other | 59,330,000 | .86 % |
Jewish | 15,670,000 | .23 % |
Returns a random factor vector of religion elements.
https://www.pewforum.org/2012/12/18/table-religious-composition-by-country-in-numbers/
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
religion(10) barplot(table(religion(10000))) pie(table(religion(10000)))
religion(10) barplot(table(religion(10000))) pie(table(religion(10000)))
grade
- Generate a random normal vector of scholastic aptitude test
(SATs).
sat(n, mean = 1500, sd = 100, min = 0, max = 2400, digits = 0, name = "SAT")
sat(n, mean = 1500, sd = 100, min = 0, max = 2400, digits = 0, name = "SAT")
n |
The number elements to generate. This can be globally set within
the environment of |
mean |
The mean value for the normal distribution to be drawn from. |
sd |
The standard deviation of the normal distribution to draw from. |
min |
A numeric lower boundary cutoff. Results less than this value will be
replaced with |
max |
A numeric upper boundary cutoff. Results greater than this value will
be replaced with |
digits |
Integer indicating the number of decimal places to be used.
Negative values are allowed (see |
name |
The name to assign to the output vector's |
Returns a random normal vector of SAT elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
sat(10) hist(sat(10000)) interval(sat, 5, n = 1000)
sat(10) hist(sat(10000)) interval(sat, 5, n = 1000)
Generate a random vector of seconds in H:M:S format.
second( n, x = seq(0, 59, by = 1)/3600, prob = NULL, random = FALSE, name = "Second" )
second( n, x = seq(0, 59, by = 1)/3600, prob = NULL, random = FALSE, name = "Second" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
random |
logical. If |
name |
The name to assign to the output vector's |
Returns a random vector of second time elements in H:M:S format.
second(20) second(20, random=TRUE) pie(table(second(2000, x = seq(0, 59, by = 10)/3600, prob = probs(6))))
second(20) second(20, random=TRUE) pie(table(second(2000, x = seq(0, 59, by = 10)/3600, prob = probs(6))))
Generate a random vector of sentences from the
presidential_debates_2012
.
sentence( n, x = wakefield::presidential_debates_2012, prob = NULL, name = "Sentence" )
sentence( n, x = wakefield::presidential_debates_2012, prob = NULL, name = "Sentence" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random character vector of sentence elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
sentence(10)
sentence(10)
Adds attributes(x)[["seriesname"]]
attribute to a
data.frame
.
seriesname(x, name)
seriesname(x, name)
x |
A |
name |
A name to assign to |
Returns a data.frame
with a
attributes(x)[["seriesname"]]
assigned.
seriesname(mtcars, "Cars") attributes(seriesname(mtcars, "Cars"))
seriesname(mtcars, "Cars") attributes(seriesname(mtcars, "Cars"))
Generate a random vector of genders.
sex( n, x = c("Male", "Female"), prob = c(0.51219512195122, 0.48780487804878), name = "Sex" ) gender( n, x = c("Male", "Female"), prob = c(0.51219512195122, 0.48780487804878), name = "Gender" )
sex( n, x = c("Male", "Female"), prob = c(0.51219512195122, 0.48780487804878), name = "Sex" ) gender( n, x = c("Male", "Female"), prob = c(0.51219512195122, 0.48780487804878), name = "Gender" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of length 2 to sample from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The genders and probabilities used match approximate gender make-up:
Gender | Percent |
Male | 51.22 % |
Female | 48.78 % |
Returns a random factor vector of gender elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
sex(10) 100*table(sex(n <- 10000))/n
sex(10) 100*table(sex(n <- 10000))/n
Generate a random vector of non-binary genders. Proportion of trans* category was taken from the Williams Institute Report (2011), and subtracted equally from the male and female categories.
sex_inclusive( n, x = c("Male", "Female", "Intersex"), prob = NULL, name = "Sex" ) gender_inclusive( n, x = c("Male", "Female", "Trans*"), prob = NULL, name = "Gender" )
sex_inclusive( n, x = c("Male", "Female", "Intersex"), prob = NULL, name = "Sex" ) gender_inclusive( n, x = c("Male", "Female", "Trans*"), prob = NULL, name = "Gender" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The genders and probabilities used match approximate gender make-up:
Gender | Percent |
Male | 51.07 % |
Female | 48.63 % |
Trans* | 0.30 % |
Returns a random factor vector of sex or gender elements.
Matthew Sigal <[email protected]>
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
sex_inclusive(10) barplot(table(sex_inclusive(10000))) gender_inclusive(10) barplot(table(gender_inclusive(10000)))
sex_inclusive(10) barplot(table(sex_inclusive(10000))) gender_inclusive(10) barplot(table(gender_inclusive(10000)))
Generate a random logical (TRUE
/FALSE
) smokes vector.
smokes(n, prob = c(0.822, 0.178), name = "Smokes")
smokes(n, prob = c(0.822, 0.178), name = "Smokes")
n |
The number elements to generate. This can be globally set within
the environment of |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The probabilities are non-smoker: 82.2% vs. smoker: 17.8%.
Returns a random logical vector of smokes elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
smokes(10) 100*table(smokes(n <- 1000))/n
smokes(10) 100*table(smokes(n <- 1000))/n
speed
and speed_in
- Generate a random normal vector of
speeds in inches.
speed_cm
- Generate a random normal vector of speeds in centimeters.
speed(n, mean = 55, sd = 10, min = 0, max = NULL, digits = 0, name = "Speed") speed_mph( n, mean = 55, sd = 10, min = 0, max = NULL, digits = 1, name = "Speed(mph)" ) speed_kph( n, mean = 88.5, sd = 16, min = 0, max = NULL, digits = 1, name = "Speed(kph)" )
speed(n, mean = 55, sd = 10, min = 0, max = NULL, digits = 0, name = "Speed") speed_mph( n, mean = 55, sd = 10, min = 0, max = NULL, digits = 1, name = "Speed(mph)" ) speed_kph( n, mean = 88.5, sd = 16, min = 0, max = NULL, digits = 1, name = "Speed(kph)" )
n |
The number elements to generate. This can be globally set within
the environment of |
mean |
The mean value for the normal distribution to be drawn from. |
sd |
The standard deviation of the normal distribution to draw from. |
min |
A numeric lower boundary cutoff. Results less than this value will be
replaced with |
max |
A numeric upper boundary cutoff. Results greater than this value will
be replaced with |
digits |
Integer indicating the number of decimal places to be used.
Negative values are allowed (see |
name |
The name to assign to the output vector's |
Returns a random normal vector of speed elements.
speed
rounds to nearest whole number. speed_in
&
speed_in
round to the nearest tenths.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
state()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
speed(10) hist(speed(10000)) interval(speed, 5, n = 1000)
speed(10) hist(speed(10000)) interval(speed, 5, n = 1000)
Generate a random factor vector of states.
state( n, x = datasets::state.name, prob = wakefield::state_populations[["Proportion"]], name = "State" )
state( n, x = datasets::state.name, prob = wakefield::state_populations[["Proportion"]], name = "State" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
The state populations and probabilities:
State | Population | Percent |
California | 37,253,956 | 12.09 % |
Texas | 25,145,561 | 8.16 % |
New York | 19,378,102 | 6.29 % |
Florida | 18,801,310 | 6.10 % |
Illinois | 12,830,632 | 4.16 % |
Pennsylvania | 12,702,379 | 4.12 % |
Ohio | 11,536,504 | 3.74 % |
Michigan | 9,883,640 | 3.21 % |
Georgia | 9,687,653 | 3.14 % |
North Carolina | 9,535,483 | 3.09 % |
New Jersey | 8,791,894 | 2.85 % |
Virginia | 8,001,024 | 2.60 % |
Washington | 6,724,540 | 2.18 % |
Massachusetts | 6,547,629 | 2.12 % |
Indiana | 6,483,802 | 2.10 % |
Arizona | 6,392,017 | 2.07 % |
Tennessee | 6,346,105 | 2.06 % |
Missouri | 5,988,927 | 1.94 % |
Maryland | 5,773,552 | 1.87 % |
Wisconsin | 5,686,986 | 1.85 % |
Minnesota | 5,303,925 | 1.72 % |
Colorado | 5,029,196 | 1.63 % |
Alabama | 4,779,736 | 1.55 % |
South Carolina | 4,625,364 | 1.50 % |
Louisiana | 4,533,372 | 1.47 % |
Kentucky | 4,339,367 | 1.41 % |
Oregon | 3,831,074 | 1.24 % |
Oklahoma | 3,751,351 | 1.22 % |
Connecticut | 3,574,097 | 1.16 % |
Iowa | 3,046,355 | .99 % |
Mississippi | 2,967,297 | .96 % |
Arkansas | 2,915,918 | .95 % |
Kansas | 2,853,118 | .93 % |
Utah | 2,763,885 | .90 % |
Nevada | 2,700,551 | .88 % |
New Mexico | 2,059,179 | .67 % |
West Virginia | 1,852,994 | .60 % |
Nebraska | 1,826,341 | .59 % |
Idaho | 1,567,582 | .51 % |
Hawaii | 1,360,301 | .44 % |
Maine | 1,328,361 | .43 % |
New Hampshire | 1,316,470 | .43 % |
Rhode Island | 1,052,567 | .34 % |
Montana | 989,415 | .32 % |
Delaware | 897,934 | .29 % |
South Dakota | 814,180 | .26 % |
Alaska | 710,231 | .23 % |
North Dakota | 672,591 | .22 % |
Vermont | 625,741 | .20 % |
Wyoming | 563,626 | .18 % |
Returns a random character vector of state elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
string()
,
upper()
,
valid()
,
year()
,
zip_code()
state(10) pie(table(state(10000))) sort(100*table(state(n <- 10000))/n)
state(10) pie(table(state(10000))) sort(100*table(state(n <- 10000))/n)
A dataset containing U.S. state populations.
data(state_populations)
data(state_populations)
A data frame with 50 rows and 3 variables
State. The 50 U.S. states.
Population. Population of state.
Proportion. Proportion of total U.S. population.
https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population
Generate a random vector of strings.
string(n, x = "[A-Za-z0-9]", length = 10, name = "String")
string(n, x = "[A-Za-z0-9]", length = 10, name = "String")
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A character vector specifying character classes to draw elements from. |
length |
Integer vector, desired string lengths. |
name |
The name to assign to the output vector's |
Returns a random character vector of string elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
upper()
,
valid()
,
year()
,
zip_code()
string(10)
string(10)
Generate a heat map of column types from a data.frame
.
table_heat( x, flip = FALSE, palette = "Set3", print = interactive(), sep = "\n" )
table_heat( x, flip = FALSE, palette = "Set3", print = interactive(), sep = "\n" )
x |
A |
flip |
logical. If |
palette |
A palette to chose from. See
|
print |
logical. If |
sep |
A separator to use between column types. Column types are
determined via |
By default coumn names retain their order. Column types are ordered
alphabetically in the legend, with NA
appearing last.
Returns a ggplot2 object.
table_heat(mtcars) #boring table_heat(CO2) table_heat(iris) table_heat(state_populations) dat <- r_data_frame(100, lorem_ipsum, birth, animal, age, grade, grade, death, dummy, grade_letter ) table_heat(dat) table_heat(dat, flip=TRUE) table_heat(r_data_theme(), flip=TRUE) ## NA values table_heat(r_na(dat, NULL)) ## Colors table_heat(r_na(dat, NULL), palette = NULL) table_heat(r_na(dat, NULL), palette = "Set1") table_heat(r_na(dat, NULL), palette = "Set2") table_heat(r_na(dat, NULL), palette = "Set1") table_heat(r_na(dat, NULL), palette = "Dark2") table_heat(r_na(dat, NULL), palette = "Spectral") table_heat(r_na(dat, NULL), palette = "Reds")
table_heat(mtcars) #boring table_heat(CO2) table_heat(iris) table_heat(state_populations) dat <- r_data_frame(100, lorem_ipsum, birth, animal, age, grade, grade, death, dummy, grade_letter ) table_heat(dat) table_heat(dat, flip=TRUE) table_heat(r_data_theme(), flip=TRUE) ## NA values table_heat(r_na(dat, NULL)) ## Colors table_heat(r_na(dat, NULL), palette = NULL) table_heat(r_na(dat, NULL), palette = "Set1") table_heat(r_na(dat, NULL), palette = "Set2") table_heat(r_na(dat, NULL), palette = "Set1") table_heat(r_na(dat, NULL), palette = "Dark2") table_heat(r_na(dat, NULL), palette = "Spectral") table_heat(r_na(dat, NULL), palette = "Reds")
Generate a random vector of times in H:M:S format.
time_stamp( n, x = seq(0, 23, by = 1), prob = NULL, random = FALSE, name = "Time" )
time_stamp( n, x = seq(0, 23, by = 1), prob = NULL, random = FALSE, name = "Time" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
random |
logical. If |
name |
The name to assign to the output vector's |
Returns a random vector of time elements in H:M:S format.
time_stamp(20) time_stamp(20, random=TRUE) pie(table(time_stamp(2000, x = seq(0, 23, by = 2), prob = probs(12))))
time_stamp(20) time_stamp(20, random=TRUE) pie(table(time_stamp(2000, x = seq(0, 23, by = 2), prob = probs(12))))
upper
- Generates a random character vector of upper case letters.
lower
- Generates a random character vector of lower case letters.
upper_factor
- Generates a random factor vector of upper case letters.
lower_factor
- Generates a random factor vector of lower case letters.
upper(n, k = 5, x = LETTERS, prob = NULL, name = "Upper") lower( n, k = 5, x = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), prob = NULL, name = "Lower" ) upper_factor(n, k = 5, x = LETTERS, prob = NULL, name = "Upper") lower_factor( n, k = 5, x = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), prob = NULL, name = "Lower" )
upper(n, k = 5, x = LETTERS, prob = NULL, name = "Upper") lower( n, k = 5, x = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), prob = NULL, name = "Lower" ) upper_factor(n, k = 5, x = LETTERS, prob = NULL, name = "Upper") lower_factor( n, k = 5, x = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), prob = NULL, name = "Lower" )
n |
The number elements to generate. This can be globally set within
the environment of |
k |
The number of the elements of x to sample from (uses 1:k). |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random character/factor vector of letter elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
valid()
,
year()
,
zip_code()
upper(10) lower(10) upper_factor(10) lower_factor(10) barplot(table(upper(10000))) barplot(table(upper(10000, prob = probs(5))))
upper(10) lower(10) upper_factor(10) lower_factor(10) barplot(table(upper(10000))) barplot(table(upper(10000, prob = probs(5))))
Generate a random logical (TRUE
/FALSE
) vector.
valid(n, prob = NULL, name = "Valid")
valid(n, prob = NULL, name = "Valid")
n |
The number elements to generate. This can be globally set within
the environment of |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random logical vector of elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
year()
,
zip_code()
valid(10) 100*table(valid(n <- 1000))/n
valid(10) 100*table(valid(n <- 1000))/n
See a listing of all available variable functions for use in
r_data_frame
or r_list
.
variables(type = NULL, ncols = 5, ...)
variables(type = NULL, ncols = 5, ...)
type |
The output type. Must be either |
ncols |
The number of columns to use if |
... |
Other arguments passed to |
Returns a character
vector,
matrix
of all variable functions, or a
list
of variable functions by type.
variables() variables("list") variables(TRUE) names(variables("list")) variables("ordered factor") variables("numeric") variables("matrix") variables("matrix", ncols=3) variables("matrix", 1) variables("matrix", byrow = TRUE)
variables() variables("list") variables(TRUE) names(variables("list")) variables("ordered factor") variables("numeric") variables("matrix") variables("matrix", ncols=3) variables("matrix", 1) variables("matrix", byrow = TRUE)
Adds the class variable
and an internal
attributes(x)[["varname"]]
attribute to a vector.
varname(x, name)
varname(x, name)
x |
A vector to add a |
name |
A name to assign to |
Returns a vector of the class variable
with a
attributes(x)[["varname"]]
assigned.
varname(1:10, "A") attributes(varname(1:10, "A")) sum(varname(1:10, "A")) varname(LETTERS, "Caps") attributes(varname(LETTERS, "Caps")) paste(varname(LETTERS, "Caps"), collapse="")
varname(1:10, "A") attributes(varname(1:10, "A")) sum(varname(1:10, "A")) varname(LETTERS, "Caps") attributes(varname(LETTERS, "Caps")) paste(varname(LETTERS, "Caps"), collapse="")
Generates random data sets including: data.frames, lists, and vectors.
Generate a random vector of years.
year( n, x = 1996:as.numeric(format(Sys.Date(), "%Y")), prob = NULL, name = "Year" )
year( n, x = 1996:as.numeric(format(Sys.Date(), "%Y")), prob = NULL, name = "Year" )
n |
The number elements to generate. This can be globally set within
the environment of |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of year elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
zip_code()
year(10) pr <- probs(length(1996:2016)) pie(table(year(10000, x= 1996:2016, prob = pr)))
year(10) pr <- probs(length(1996:2016)) pie(table(year(10000, x= 1996:2016, prob = pr)))
Generate a random vector of zip codes.
zip_code(n, k = 10, x = 10000:99999, prob = NULL, name = "Zip")
zip_code(n, k = 10, x = 10000:99999, prob = NULL, name = "Zip")
n |
The number elements to generate. This can be globally set within
the environment of |
k |
The number of the elements of x to sample from (uses |
x |
A vector of elements to chose from. |
prob |
A vector of probabilities to chose from. |
name |
The name to assign to the output vector's |
Returns a random vector of zip code elements.
Other variable functions:
age()
,
animal()
,
answer()
,
area()
,
car()
,
children()
,
coin()
,
color
,
date_stamp()
,
death()
,
dice()
,
dna()
,
dob()
,
dummy()
,
education()
,
employment()
,
eye()
,
grade_level()
,
grade()
,
group()
,
hair()
,
height()
,
income()
,
internet_browser()
,
iq()
,
language
,
level()
,
likert()
,
lorem_ipsum()
,
marital()
,
military()
,
month()
,
name
,
normal()
,
political()
,
race()
,
religion()
,
sat()
,
sentence()
,
sex_inclusive()
,
sex()
,
smokes()
,
speed()
,
state()
,
string()
,
upper()
,
valid()
,
year()
zip_code(10) pie(table(zip_code(10000, prob = probs(10))))
zip_code(10) pie(table(zip_code(10000, prob = probs(10))))