| Title: | Generate Summary Tables for Categorical, Ordinal, and Continuous Data |
|---|---|
| Description: | Provides functions for tabulating and summarizing categorical, multiple response, ordinal, and continuous variables in R data frames. Makes it easy to create clear, structured summary tables, so you spend less time wrangling data and more time interpreting it. |
| Authors: | Ama Nyame-Mensah [aut, cre] |
| Maintainer: | Ama Nyame-Mensah <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.1.9000 |
| Built: | 2026-05-26 10:28:13 UTC |
| Source: | https://github.com/anyamemensah/summarytabl |
cat_group_tbl() summarizes nominal or categorical
variables by a grouping variable, returning frequency counts and
percentages.
cat_group_tbl( data, row_var, col_var, margins = "all", na.rm.row_var = FALSE, na.rm.col_var = FALSE, pivot = "longer", only = NULL, ignore = NULL )cat_group_tbl( data, row_var, col_var, margins = "all", na.rm.row_var = FALSE, na.rm.col_var = FALSE, pivot = "longer", only = NULL, ignore = NULL )
data |
A data frame. |
row_var |
A character string of the name of a variable in |
col_var |
A character string of the name of a variable in |
margins |
A character string that determines how percentage values
are calculated; whether they sum to one across rows, columns, or the
entire table (i.e., all). Defaults to |
na.rm.row_var |
A logical value indicating whether missing values for
|
na.rm.col_var |
A logical value indicating whether missing values for
|
pivot |
A character string that determines the format of the table. By
default, |
only |
A character string or vector of character strings of the types
of summary data to return. Default is |
ignore |
An optional named vector or list that defines values to exclude
from |
A tibble showing the count and percentage of each category in row_var
by each category in col_var.
Ama Nyame-Mensah
cat_group_tbl(data = nlsy, row_var = "gender", col_var = "bthwht", pivot = "wider", only = "count") cat_group_tbl(data = nlsy, row_var = "birthord", col_var = "breastfed", pivot = "longer")cat_group_tbl(data = nlsy, row_var = "gender", col_var = "bthwht", pivot = "wider", only = "count") cat_group_tbl(data = nlsy, row_var = "birthord", col_var = "breastfed", pivot = "longer")
cat_tbl() summarizes nominal or categorical variables,
returning frequency counts and percentages.
cat_tbl(data, var, na.rm = FALSE, only = NULL, ignore = NULL)cat_tbl(data, var, na.rm = FALSE, only = NULL, ignore = NULL)
data |
A data frame. |
var |
A character string of the name of a variable in |
na.rm |
A logical value indicating whether missing values should be
removed before calculations. Default is |
only |
A character string or vector of character strings of the types
of summary data to return. Default is |
ignore |
An optional vector that contains values to exclude from |
A tibble showing the count and percentage of each category in var
Ama Nyame-Mensah
cat_tbl(data = nlsy, var = "gender") cat_tbl(data = nlsy, var = "race", only = "count") cat_tbl(data = nlsy, var = "race", ignore = "Hispanic", only = "percent", na.rm = TRUE)cat_tbl(data = nlsy, var = "gender") cat_tbl(data = nlsy, var = "race", only = "count") cat_tbl(data = nlsy, var = "race", ignore = "Hispanic", only = "percent", na.rm = TRUE)
This function checks whether named lists and vectors contain
invalid values (like NULL or NA), have invalid names (such as missing
or empty names), ensures the number of valid names matches the number of
supplied values, and confirms that valid names from the object correspond
to the provided names. If any of these checks fail, the function returns
the default value.
check_named_vctr(x, names, default)check_named_vctr(x, names, default)
x |
A named vector. |
names |
A character vector or list of character vectors of length one specifying the names to be matched. |
default |
Default value to return |
Either the original object, x, or the default value.
Ama Nyame-Mensah
# returns NULL check_named_vctr(x = c(one = 1, two = 2, 3), names = c("one", "two", "three"), default = NULL) # returns x check_named_vctr(x = list(one = 1, two = 2, three = 3), names = list("one", "two", "three"), default = NULL) # also returns x check_named_vctr(x = c(baako = 1, mmienu = 2, mmiensa = 3), names = list("baako", "mmienu", "mmiensa"), default = NULL)# returns NULL check_named_vctr(x = c(one = 1, two = 2, 3), names = c("one", "two", "three"), default = NULL) # returns x check_named_vctr(x = list(one = 1, two = 2, three = 3), names = list("one", "two", "three"), default = NULL) # also returns x check_named_vctr(x = c(baako = 1, mmienu = 2, mmiensa = 3), names = list("baako", "mmienu", "mmiensa"), default = NULL)
Subset of data from the National Longitudinal Survey of Youth (NLSY) 1979 Children and Young Adults. This dataset includes survey responses about feelings and behaviors linked to depressive symptoms in children and young adults. For more information about the National Longitudinal Survey of Youth, visit: https://www.nlsinfo.org/.
depressivedepressive
A data frame with 11,551 rows and 12 columns:
Child identification number)
race of child (1 = Hispanic, 2 = Black, 3 = Non-Black,Non-Hispanic)
sex of child (1 = male, 2 = female)
year of child's bith
how often child feels sad and blue (1 = often, 2 = sometimes, 3 = hardly ever)
how often child feels nervous, tense, or on edge (1 = often, 2 = sometimes, 3 = hardly ever)
how often child feels happy (1 = often, 2 = sometimes, 3 = hardly ever)
how often child feels bored (1 = often, 2 = sometimes, 3 = hardly ever)
how often child feels lonely (1 = often, 2 = sometimes, 3 = hardly ever)
how often child feels tired or worn out (1 = often, 2 = sometimes, 3 = hardly ever)
how often child feels excited about something (1 = often, 2 = sometimes, 3 = hardly ever)
how often child feels too busy to get everything (1 = often, 2 = sometimes, 3 = hardly ever)
mean_group_tbl() calculates summary statistics (i.e.,
mean, median, standard deviation, minimum, maximum, and count of
non-missing values) for continuous (i.e., interval and ratio-level)
variables, grouped either by another variable in your dataset or by
a matched pattern in the variable names.
mean_group_tbl( data, var_stem, group, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, group_type = "variable", group_name = NULL, regex_group = FALSE, ignore_group_case = FALSE, remove_group_non_alnum = TRUE, na_removal = "listwise", only = NULL, var_labels = NULL, ignore = NULL )mean_group_tbl( data, var_stem, group, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, group_type = "variable", group_name = NULL, regex_group = FALSE, ignore_group_case = FALSE, remove_group_non_alnum = TRUE, na_removal = "listwise", only = NULL, var_labels = NULL, ignore = NULL )
data |
A data frame. |
var_stem |
A character vector with one or more elements, where each
represents either a variable stem or the complete name of a variable present
in |
group |
A character string representing a variable name or a pattern
used to search for variables in |
var_input |
A character string specifying whether the values supplied
to |
regex_stem |
A logical value indicating whether to use Perl-compatible
regular expressions when searching for variable stems. Default is |
ignore_stem_case |
A logical value indicating whether the search for
columns matching the supplied |
group_type |
A character string that defines how the |
group_name |
An optional character string used to rename the |
regex_group |
A logical value indicating whether to use Perl-compatible
regular expressions when searching for |
ignore_group_case |
A logical value specifying whether the search for a
grouping variable (if |
remove_group_non_alnum |
A logical value indicating whether to remove
all non-alphanumeric characters (i.e., anything that is not a letter or
number) from |
na_removal |
A character string specifying how missing values are
handled. Must be one of
|
only |
A character string or vector of character strings specifying
which summary statistics to return. Defaults to |
var_labels |
An optional named character vector or list used to assign
custom labels to variable names. Each element must be named and correspond
to a variable included in the returned table. If |
ignore |
An optional named vector or list indicating values to exclude
from variables matching specified stems (or names), and, if applicable, from
a grouping variable in |
A tibble showing summary statistics for continuous variables, grouped either by a specified variable in the dataset or by matching patterns in variable names.
Ama Nyame-Mensah
sdoh_child_ages_region <- dplyr::select(sdoh, c(REGION, ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9, ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17)) mean_group_tbl(data = sdoh_child_ages_region, var_stem = "ACS_PCT_AGE", group = "REGION", group_name = "us_region", na_removal = "pairwise", var_labels = c( ACS_PCT_AGE_0_4 = "% of population between ages 0-4", ACS_PCT_AGE_5_9 = "% of population between ages 5-9", ACS_PCT_AGE_10_14 = "% of population between ages 10-14", ACS_PCT_AGE_15_17 = "% of population between ages 15-17")) set.seed(0222) grouped_data <- data.frame( symptoms.t1 = sample(c(0:10, -999), replace = TRUE, size = 50), symptoms.t2 = sample(c(NA, 0:10, -999), replace = TRUE, size = 50) ) mean_group_tbl(data = grouped_data, var_stem = "symptoms", group = ".t\\d", group_type = "pattern", na_removal = "listwise", ignore = c(symptoms = -999))sdoh_child_ages_region <- dplyr::select(sdoh, c(REGION, ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9, ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17)) mean_group_tbl(data = sdoh_child_ages_region, var_stem = "ACS_PCT_AGE", group = "REGION", group_name = "us_region", na_removal = "pairwise", var_labels = c( ACS_PCT_AGE_0_4 = "% of population between ages 0-4", ACS_PCT_AGE_5_9 = "% of population between ages 5-9", ACS_PCT_AGE_10_14 = "% of population between ages 10-14", ACS_PCT_AGE_15_17 = "% of population between ages 15-17")) set.seed(0222) grouped_data <- data.frame( symptoms.t1 = sample(c(0:10, -999), replace = TRUE, size = 50), symptoms.t2 = sample(c(NA, 0:10, -999), replace = TRUE, size = 50) ) mean_group_tbl(data = grouped_data, var_stem = "symptoms", group = ".t\\d", group_type = "pattern", na_removal = "listwise", ignore = c(symptoms = -999))
mean_tbl() calculates summary statistics (i.e., mean,
median, standard deviation, minimum, maximum, and count of non-missing
values) for continuous (i.e., interval and ratio-level) variables.
mean_tbl( data, var_stem, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, na_removal = "listwise", only = NULL, var_labels = NULL, ignore = NULL )mean_tbl( data, var_stem, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, na_removal = "listwise", only = NULL, var_labels = NULL, ignore = NULL )
data |
A data frame. |
var_stem |
A character vector with one or more elements, where each
represents either a variable stem or the complete name of a variable present
in |
var_input |
A character string specifying whether the values supplied
to |
regex_stem |
A logical value indicating whether to use Perl-compatible
regular expressions when searching for variable stems. Default is |
ignore_stem_case |
A logical value indicating whether the search for
columns matching the supplied |
na_removal |
A character string specifying how missing values are
handled. Must be one of
|
only |
A character string or vector of character strings specifying
which summary statistics to return. Defaults to |
var_labels |
An optional named character vector or list used to assign
custom labels to variable names. Each element must be named and correspond
to a variable included in the returned table. If |
ignore |
An optional named vector or list indicating values to exclude
from variables matching specified stems (or names). Defaults to |
A tibble showing summary statistics for continuous variables.
Ama Nyame-Mensah
sdoh_child_ages <- dplyr::select(sdoh, c(ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9, ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17)) mean_tbl(data = sdoh_child_ages, var_stem = "ACS_PCT_AGE") mean_tbl(data = sdoh_child_ages, var_stem = "ACS_PCT_AGE", na_removal = "pairwise", var_labels = c( ACS_PCT_AGE_0_4 = "% of population between ages 0-4", ACS_PCT_AGE_5_9 = "% of population between ages 5-9", ACS_PCT_AGE_10_14 = "% of population between ages 10-14", ACS_PCT_AGE_15_17 = "% of population between ages 15-17"))sdoh_child_ages <- dplyr::select(sdoh, c(ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9, ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17)) mean_tbl(data = sdoh_child_ages, var_stem = "ACS_PCT_AGE") mean_tbl(data = sdoh_child_ages, var_stem = "ACS_PCT_AGE", na_removal = "pairwise", var_labels = c( ACS_PCT_AGE_0_4 = "% of population between ages 0-4", ACS_PCT_AGE_5_9 = "% of population between ages 5-9", ACS_PCT_AGE_10_14 = "% of population between ages 10-14", ACS_PCT_AGE_15_17 = "% of population between ages 15-17"))
These data are a subset from the National Longitudinal Survey of Youth (NLSY) 1979 Children and Young Adults.The data contains 2,976 observations and 10 variables.
For more information about the National Longitudinal Survey of Youth, visit https://www.nlsinfo.org/.
nlsynlsy
A tibble with 2,976 rows and 11 columns:
Child identification number)
race of child (Hispanic, Black, Non-Black,Non-Hispanic)
gender of child (1 = male, 0 = female)
birth order of child
Age of mother at birth of child
whether child was born low birth weight (1 = yes, 0 = no)
whether child was breastfed (1 = yes, 0 = no)
Highest grade completed by child’s mother
PIAT Math Standard Score
PIAT Reading Recognition Standard Score
Number of household members in household
Subset of data from the 2020 Social Determinants of Health (SDOH) Database. For more information about the 2020 SDOH Database, visit: https://www.ahrq.gov/sdoh/index.html.
sdohsdoh
A tibble with 3,229 rows and 29 columns:
SDOH file year
State-county FIPS Code (5-digit)
State FIPS Code (2-digit)
State name
County name
Census region name
Territory indicator (1= U.S. Territory, 0= U.S. State or DC)
Percentage of population between ages 0-4
Percentage of population between ages 5-9
Percentage of population between ages 10-14
Percentage of population between ages 15-17
Monthly (January) precipitation (Inches)
Monthly (February) precipitation (Inches)
Monthly (March) precipitation (Inches)
Monthly (April) precipitation (Inches)
Monthly (May) precipitation (Inches)
Monthly (June) precipitation (Inches)
Monthly (July) precipitation (Inches)
Monthly (August) precipitation (Inches)
Monthly (September) precipitation (Inches)
Monthly (October) precipitation (Inches)
Monthly (November) precipitation (Inches)
Monthly (December) precipitation (Inches)
Percentage of home health agencies offering nursing care services
Percentage of home health agencies offering physical therapy services
Percentage of home health agencies offering occupational therapy services
Percentage of home health agencies offering speech pathology services
Percentage of home health agencies offering medical social services
Percentage of home health agencies offering home health aide services
select_group_tbl() displays frequency counts and
percentages for multiple response variables (e.g., a series of
questions where participants answer "Yes" or "No" to each item) as
well as ordinal variables (such as Likert or Likert-type items with
responses ranging from "Strongly Disagree" to "Strongly Agree", where
respondents select one response per statement, question, or item),
grouped either by another variable in your dataset or by a matched
pattern in the variable names.
select_group_tbl( data, var_stem, group, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, group_type = "variable", group_name = NULL, margins = "all", regex_group = FALSE, ignore_group_case = FALSE, remove_group_non_alnum = TRUE, na_removal = "listwise", pivot = "longer", only = NULL, var_labels = NULL, ignore = NULL, force_pivot = FALSE )select_group_tbl( data, var_stem, group, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, group_type = "variable", group_name = NULL, margins = "all", regex_group = FALSE, ignore_group_case = FALSE, remove_group_non_alnum = TRUE, na_removal = "listwise", pivot = "longer", only = NULL, var_labels = NULL, ignore = NULL, force_pivot = FALSE )
data |
A data frame. |
var_stem |
A character vector with one or more elements, where each
represents either a variable stem or the complete name of a variable present
in |
group |
A character string representing a variable name or a pattern
used to search for variables in |
var_input |
A character string specifying whether the values supplied
to |
regex_stem |
A logical value indicating whether to use Perl-compatible
regular expressions when searching for variable stems. Default is |
ignore_stem_case |
A logical value indicating whether the search for
columns matching the supplied |
group_type |
A character string that defines how the |
group_name |
An optional character string used to rename the |
margins |
A character string that determines how percentage values are
calculated; whether they sum to one across rows, columns, or the entire
variable (i.e., all). Defaults to |
regex_group |
A logical value indicating whether to use Perl-compatible
regular expressions when searching for |
ignore_group_case |
A logical value specifying whether the search for a
grouping variable (if |
remove_group_non_alnum |
A logical value indicating whether to remove
all non-alphanumeric characters (i.e., anything that is not a letter or
number) from |
na_removal |
A character string specifying how missing values are
handled. Must be one of
|
pivot |
A character string that determines the format of the table. By
default, |
only |
A character string or vector of character strings of the types of
summary data to return. Default is |
var_labels |
An optional named character vector or list used to assign
custom labels to variable names. Each element must be named and correspond
to a variable included in the returned table. If |
ignore |
An optional named vector or list indicating values to exclude
from variables matching specified stems (or names), and, if applicable, from a
grouping variable in |
force_pivot |
A logical value that enables pivoting to the 'wider' format
even when variables have inconsistent value sets. By default, this is set to
|
A tibble displaying the count and percentage for each category in a multiple response variable, grouped either by a specified variable in the dataset or by matching patterns in variable names.
Ama Nyame-Mensah
select_group_tbl(data = stem_social_psych, var_stem = "belong_belong", group = "\\d", group_type = "pattern", group_name = "wave", na_removal = "pairwise", pivot = "wider", only = "count") tas_recoded <- tas |> dplyr::mutate(sex = dplyr::case_when( sex == 1 ~ "female", sex == 2 ~ "male", TRUE ~ NA)) |> dplyr::mutate(dplyr::across( .cols = dplyr::starts_with("involved_"), .fns = ~ dplyr::case_when( .x == 1 ~ "selected", .x == 0 ~ "unselected", TRUE ~ NA) )) select_group_tbl(data = tas_recoded, var_stem = "involved_", group = "sex", group_type = "variable", na_removal = "pairwise", pivot = "wider") depressive_recoded <- depressive |> dplyr::mutate(sex = dplyr::case_when( sex == 1 ~ "male", sex == 2 ~ "female", TRUE ~ NA)) |> dplyr::mutate(dplyr::across( .cols = dplyr::starts_with("dep_"), .fns = ~ dplyr::case_when( .x == 1 ~ "often", .x == 2 ~ "sometimes", .x == 3 ~ "hardly", TRUE ~ NA ) )) select_group_tbl(data = depressive_recoded, var_stem = "dep", group = "sex", group_type = "variable", na_removal = "listwise", pivot = "wider", only = "percent", var_labels = c(dep_1 = "how often child feels sad and blue", dep_2 = "how often child feels nervous, tense, or on edge", dep_3 = "how often child feels happy", dep_4 = "how often child feels bored", dep_5 = "how often child feels lonely", dep_6 = "how often child feels tired or worn out", dep_7 = "how often child feels excited about something", dep_8 = "how often child feels too busy to get everything"))select_group_tbl(data = stem_social_psych, var_stem = "belong_belong", group = "\\d", group_type = "pattern", group_name = "wave", na_removal = "pairwise", pivot = "wider", only = "count") tas_recoded <- tas |> dplyr::mutate(sex = dplyr::case_when( sex == 1 ~ "female", sex == 2 ~ "male", TRUE ~ NA)) |> dplyr::mutate(dplyr::across( .cols = dplyr::starts_with("involved_"), .fns = ~ dplyr::case_when( .x == 1 ~ "selected", .x == 0 ~ "unselected", TRUE ~ NA) )) select_group_tbl(data = tas_recoded, var_stem = "involved_", group = "sex", group_type = "variable", na_removal = "pairwise", pivot = "wider") depressive_recoded <- depressive |> dplyr::mutate(sex = dplyr::case_when( sex == 1 ~ "male", sex == 2 ~ "female", TRUE ~ NA)) |> dplyr::mutate(dplyr::across( .cols = dplyr::starts_with("dep_"), .fns = ~ dplyr::case_when( .x == 1 ~ "often", .x == 2 ~ "sometimes", .x == 3 ~ "hardly", TRUE ~ NA ) )) select_group_tbl(data = depressive_recoded, var_stem = "dep", group = "sex", group_type = "variable", na_removal = "listwise", pivot = "wider", only = "percent", var_labels = c(dep_1 = "how often child feels sad and blue", dep_2 = "how often child feels nervous, tense, or on edge", dep_3 = "how often child feels happy", dep_4 = "how often child feels bored", dep_5 = "how often child feels lonely", dep_6 = "how often child feels tired or worn out", dep_7 = "how often child feels excited about something", dep_8 = "how often child feels too busy to get everything"))
select_tbl() displays frequency counts and percentages
for multiple response variables (e.g., a series of questions where
participants answer "Yes" or "No" to each item) as well as ordinal
variables (such as Likert or Likert-type items with responses ranging
from "Strongly Disagree" to "Strongly Agree", where respondents select
one response per statement, question, or item).
select_tbl( data, var_stem, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, na_removal = "listwise", pivot = "longer", only = NULL, var_labels = NULL, ignore = NULL, force_pivot = FALSE )select_tbl( data, var_stem, var_input = "stem", regex_stem = FALSE, ignore_stem_case = FALSE, na_removal = "listwise", pivot = "longer", only = NULL, var_labels = NULL, ignore = NULL, force_pivot = FALSE )
data |
A data frame. |
var_stem |
A character vector with one or more elements, where each
represents either a variable stem or the complete name of a variable present
in |
var_input |
A character string specifying whether the values
supplied to |
regex_stem |
A logical value indicating whether to use Perl-compatible
regular expressions when searching for variable stems. Default is |
ignore_stem_case |
A logical value indicating whether the search for
columns matching the supplied |
na_removal |
A character string specifying how missing values are
handled. Must be one of
|
pivot |
A character string that determines the format of the table. By
default, |
only |
A character string or vector of character strings of the types of
summary data to return. Default is |
var_labels |
An optional named character vector or list used to assign
custom labels to variable names. Each element must be named and correspond
to a variable included in the returned table. If |
ignore |
An optional named vector or list indicating values to exclude
from variables matching specified stems (or names). Defaults to |
force_pivot |
A logical value that enables pivoting to the 'wider'
format even when variables have inconsistent value sets. By default, this is
set to |
A tibble displaying the count and percentage for each category in a multiple response variable.
Ama Nyame-Mensah
select_tbl(data = tas, var_stem = "involved_", na_removal = "pairwise") select_tbl(data = depressive, var_stem = "dep", na_removal = "listwise", pivot = "wider", only = "percent") var_label_example <- c(dep_1 = "how often child feels sad and blue", dep_2 = "how often child feels nervous, tense, or on edge", dep_3 = "how often child feels happy", dep_4 = "how often child feels bored", dep_5 = "how often child feels lonely", dep_6 = "how often child feels tired or worn out", dep_7 = "how often child feels excited about something", dep_8 = "how often child feels too busy to get everything") select_tbl(data = depressive, var_stem = "dep", na_removal = "pairwise", pivot = "longer", var_labels = var_label_example) select_tbl(data = depressive, var_stem = "dep", na_removal = "pairwise", pivot = "wider", only = "count", var_labels = var_label_example)select_tbl(data = tas, var_stem = "involved_", na_removal = "pairwise") select_tbl(data = depressive, var_stem = "dep", na_removal = "listwise", pivot = "wider", only = "percent") var_label_example <- c(dep_1 = "how often child feels sad and blue", dep_2 = "how often child feels nervous, tense, or on edge", dep_3 = "how often child feels happy", dep_4 = "how often child feels bored", dep_5 = "how often child feels lonely", dep_6 = "how often child feels tired or worn out", dep_7 = "how often child feels excited about something", dep_8 = "how often child feels too busy to get everything") select_tbl(data = depressive, var_stem = "dep", na_removal = "pairwise", pivot = "longer", var_labels = var_label_example) select_tbl(data = depressive, var_stem = "dep", na_removal = "pairwise", pivot = "wider", only = "count", var_labels = var_label_example)
Simulated data designed to reflect social psychological responses among college students. These data were generated to model attitudes, perceptions, and experiences of students participating in a Science, Technology, Engineering, and Mathematics (STEM) intervention program. The dataset aims to represent real- world psychological factors relevant to STEM education contexts.
stem_social_psychstem_social_psych
A tibble with 786 rows and 37 columns:
student id number)
I feel like I belong in STEM (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
I feel like an outsider in STEM (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
STEM is a big part of who I am. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
I feel welcomed in STEM workplaces (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
I do not have much in common with the other students in my STEM classes.(1=Strongly disagree,2=Somewhat disagree,3=Neither disagree nor agree, 4=Somewhat agree, 5=Strongly agree)
pass my STEM courses.(1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
learn the foundations and concepts of scientific thinking. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree, 4=Somewhat agree, 5=Strongly agree)
do well in a stem-related field. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
quickly learn new science areas, systems, techniques or concepts on my own. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree, 4=Somewhat agree, 5=Strongly agree)
contribute to a science project. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
clearly communicate scientific problems and findings to varied audiences (1=Strongly disagree,2=Somewhat disagree, 3=Neither disagree nor agree, 4=Somewhat agree,5=Strongly agree)
become a scientist. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
complete an undergraduate STEM degree. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
get admitted to a graduate STEM program. (1=Strongly disagree,2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
be successful in a graduate STEM program. (1=Strongly disagree,2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
I feel like I belong in STEM (1=Strongly disagree, 2=Somewhat disagree, 3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
I feel like an outsider in STEM. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
STEM is a big part of who I am. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
I feel welcomed in STEM workplaces. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
I do not have much in common with the other students in my STEM classes.(1=Strongly disagree,2=Somewhat disagree,3=Neither disagree nor agree, 4=Somewhat agree, 5=Strongly agree)
pass my STEM courses. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
learn the foundations and concepts of scientific thinking. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree, 4=Somewhat agree, 5=Strongly agree)
do well in a stem-related field. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
quickly learn new science areas, systems, techniques or concepts on my own. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree, 4=Somewhat agree, 5=Strongly agree)
contribute to a science project. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
clearly communicate scientific problems and findings to varied audiences (1=Strongly disagree,2=Somewhat disagree, 3=Neither disagree nor agree, 4=Somewhat agree,5=Strongly agree)
become a scientist. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree,5=Strongly agree)
complete an undergraduate STEM degree. (1=Strongly disagree, 2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
get admitted to a graduate STEM program. (1=Strongly disagree,2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
be successful in a graduate STEM program. (1=Strongly disagree,2=Somewhat disagree,3=Neither disagree nor agree,4=Somewhat agree, 5=Strongly agree)
Participant's current sex (0=Not Male,1=Male)
Whether participant has a disability (0=No, 1=Yes)
Whether participant is a first generation college student (0=No, 1=Yes)
Whether participant is a STEM Major (0=No, 1=Yes)
Whether student has participated in an experiential learning program, such as an internship, research, or leadership opportunity. (0=No, 1=Yes)
Whether participant is Asian, Middle Eastern/Arab or White (0) vs. Black, Indigenous, Hispanic/Latino, or Mixed Race (1)
Subset of data from the Panel Study of Income Dynamics (PSID) Transition into Adulthood Supplement. This dataset includes information from young adults about how they spend their free time, including participation in organized activities such as clubs, sports or athletic teams, social-action groups, and other structured extracurricular engagements. For more information about the Panel Study of Income Dynamics, visit: https://psidonline.isr.umich.edu/GettingStarted.aspx.
tastas
A tibble with 2,526 rows and 8 columns:
personal identification number)
sex of individual (1 = female, 2 = male)
whether the individual participated in any organized activities related to art, music, or the theater in the last 12 months (1 = yes, 0 = no)
whether the individual was a member of any athletic or sports teams in the last 12 months (1 = yes, 0 = no)
whether the individual was involved with any high school or college clubs or student government in the last 12 months (1 = yes, 0 = no)
whether the individual voted in the national election in November 2016 that was held to elect the President (1 = yes, 0 = no)
whether the individual was involved in any political groups, solidarity or ethnic-support groups or social-action groups in the last 12 months (1 = yes, 0 = no)
whether the individual was involved in any unpaid volunteer or community service work in the last 12 months (1 = yes, 0 = no)
Social Psychological (Simulated) Data
Description
Simulated data capturing social psychological responses in a real-world college setting. This dataset represents college students' feelings, attitudes, and perceptions related to their experiences in STEM degree programs. It was designed to reflect key psychological factors that influence student engagement, motivation, and persistence in STEM fields.
Usage
Format
A tibble with 10,200 rows and 17 columns:
participant id number)
I feel like I belong at this institution (1=Strongly Disagree, 2=Disagree,3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I feel like part of the community (1=Strongly Disagree, 2=Disagree,3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I feel valued by this institution (1=Strongly Disagree, 2=Disagree,3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
This institution is a big part of who I am (1=Strongly Disagree,2=Disagree,3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I feel comfortable being myself in this setting (1=Strongly Disagree,2=Disagree,3=Neither agree nor disagree,4=Agree, 5=Strongly Agree)
This institution is a big part of who I am (1=Strongly Disagree, 2=Disagree,3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I care about doing well at this institution (1=Strongly Disagree, 2=Disagree,3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I am confident about A (1=Strongly Disagree,2=Disagree, 3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I am confident about B (1=Strongly Disagree,2=Disagree, 3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I am confident about C (1=Strongly Disagree,2=Disagree, 3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I am confident about D (1=Strongly Disagree,2=Disagree, 3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I am confident about E (1=Strongly Disagree,2=Disagree, 3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I am confident about F (1=Strongly Disagree,2=Disagree, 3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
I am confident about G (1=Strongly Disagree,2=Disagree, 3=Neither agree nor disagree,4=Agree,5=Strongly Agree)
Participant's gender identity (1=Woman,2=Man,3=Non-binary, 4=Self-identify,5=Transgender,6=Gender-queer/non-conforming)
Participant's citizenship status (1=U.S. citizen,2=Non-U.S. citizen with permanent residency,3=Non-U.S. citizen with temporary visa,4=Other)