Filter feed or dataset results by validation quality thresholds. This is a
convenience wrapper around get_validation_report() that returns the original
data filtered to only include feeds/datasets meeting your quality criteria.
Note: This function does not support GBFS validation reports at this time as GBFS validation reports are located at a different endpoint and have a different validation criteria.
Usage
filter_by_validation(
data,
max_errors = NULL,
max_warnings = NULL,
max_info = NULL,
require_validation = TRUE
)Arguments
- data
A tibble from
feeds(),mobdb_datasets(), ormobdb_search().- max_errors
Maximum number of validation errors allowed. Use
0for error-free feeds. IfNULL(default), no error filtering is applied.- max_warnings
Maximum number of validation warnings allowed. If
NULL(default), no warning filtering is applied.- max_info
Maximum number of informational notices allowed. If
NULL(default), no info filtering is applied.- require_validation
Logical. If
TRUE(default), exclude feeds/datasets that have no validation data. IfFALSE, include them in results.
Value
A filtered version of the input data frame containing only feeds/datasets that meet the specified quality criteria.
See also
get_validation_report() to inspect validation metrics,
view_validation_report() to view full validation reports
Examples
# Create sample data with validation information (search results structure)
sample_data <- tibble::tibble(
id = c("mdb-1", "mdb-2", "mdb-3"),
provider = c("Agency A", "Agency B", "Agency C"),
latest_dataset = tibble::tibble(
id = c("mdb-1-202501", "mdb-2-202501", "mdb-3-202501"),
validation_report = tibble::tibble(
total_error = c(0L, 5L, 100L),
total_warning = c(10L, 50L, 500L),
total_info = c(5L, 10L, 20L)
)
)
)
# Filter to feeds with zero errors
filter_by_validation(sample_data, max_errors = 0)
#> Filtered to 1 of 3 items matching quality criteria.
#> # A tibble: 1 × 3
#> id provider latest_dataset$id $validation_report$total_er…¹ $$total_warning
#> <chr> <chr> <chr> <int> <int>
#> 1 mdb-1 Agency A mdb-1-202501 0 10
#> # ℹ abbreviated name: ¹$validation_report$total_error
#> # ℹ 1 more variable: latest_dataset$validation_report$total_info <int>
# Filter with multiple criteria
filter_by_validation(sample_data, max_errors = 10, max_warnings = 100)
#> Filtered to 2 of 3 items matching quality criteria.
#> # A tibble: 2 × 3
#> id provider latest_dataset$id $validation_report$total_er…¹ $$total_warning
#> <chr> <chr> <chr> <int> <int>
#> 1 mdb-1 Agency A mdb-1-202501 0 10
#> 2 mdb-2 Agency B mdb-2-202501 5 50
#> # ℹ abbreviated name: ¹$validation_report$total_error
#> # ℹ 1 more variable: latest_dataset$validation_report$total_info <int>
if (FALSE) { # mobdb_can_run_examples()
# With real API data:
ca_feeds <- feeds(
country_code = "US",
subdivision_name = "California",
data_type = "gtfs"
)
clean_feeds <- filter_by_validation(ca_feeds, max_errors = 0)
}
