import pointblank as pb
schema = pb.Schema(
is_active=pb.bool_field(p_true=0.8),
is_premium=pb.bool_field(p_true=0.2),
is_verified=pb.bool_field(),
)
pb.preview(pb.generate_dataset(schema, n=100, seed=23))PolarsRows100Columns3 |
|||
functionCreate a boolean column specification for use in a schema.
USAGE
The bool_field() function defines the constraints and behavior for a boolean column when generating synthetic data with generate_dataset(). The p_true= parameter controls the probability of generating True values, which is useful for simulating real-world distributions where events may be rare or common (e.g., 5% fraud rate, 80% active users).
By default, True and False are equally likely (p_true=0.5). Setting p_true=0.0 produces all False values, and p_true=1.0 produces all True values.
p_true : float = 0.5Probability of generating True. Default is 0.5 (equal probability). Must be between 0.0 and 1.0.
nullable : bool = FalseWhether the column can contain null values. Default is False.
null_probability : float = 0.0Probability of generating a null value for each row when nullable=True. Must be between 0.0 and 1.0. Default is 0.0.
unique : bool = FalseWhether all values must be unique. Default is False. Note that boolean columns can only have 2 unique non-null values, so n must be <= 2 when unique=True (or <= 3 with nullable=True).
generator : Callable[[], Any] | None = NoneCustom callable that generates values. When provided, this overrides all other constraints. The callable should take no arguments and return a single boolean value.
BoolFieldA boolean field specification that can be passed to Schema().
: ValueErrorIf p_true is not between 0.0 and 1.0, or if null_probability is not between 0.0 and 1.0.
The p_true= parameter controls the distribution of True/False values, allowing you to simulate different probabilities:
PolarsRows100Columns3 |
|||
is_active Boolean |
is_premium Boolean |
is_verified Boolean |
|
|---|---|---|---|
| 1 | False | False | False |
| 2 | False | False | False |
| 3 | False | False | False |
| 4 | True | True | True |
| 5 | True | False | False |
| 96 | True | False | True |
| 97 | True | False | True |
| 98 | False | False | False |
| 99 | False | False | False |
| 100 | False | False | False |
Optional boolean flags can be simulated by combining nullable=True with null_probability=:
PolarsRows50Columns2 |
||
opted_in Boolean |
has_referral Boolean |
|
|---|---|---|
| 1 | False | None |
| 2 | True | True |
| 3 | True | None |
| 4 | True | True |
| 5 | False | False |
| 46 | True | True |
| 47 | True | None |
| 48 | True | True |
| 49 | False | False |
| 50 | True | None |
Boolean fields can be combined with other field types in a realistic schema:
PolarsRows30Columns4 |
||||
user_id Int64 |
name String |
email_verified Boolean |
is_admin Boolean |
|
|---|---|---|---|---|
| 1 | 300544187282452692 | Jackson Oliver | True | False |
| 2 | 4450845842995915857 | Adam Schultz | True | False |
| 3 | 136805169105849017 | Adalyn Webb | True | False |
| 4 | 4266552081017657985 | Thomas Duffy | True | False |
| 5 | 4531409703019074138 | Denise Hill | True | False |
| 26 | 4055035012371305120 | Brandon Hoffman | False | False |
| 27 | 3461005733469027473 | Cecilia Klein | True | False |
| 28 | 5376594645785815587 | Ryan Whitaker | True | True |
| 29 | 2173523867229340168 | Theodore Bennett | True | True |
| 30 | 1798126872665261570 | Walker Ryan | True | False |