This function generates random data that conforms to a schema’s column definitions. When the schema is defined using Field objects with constraints (e.g., min_val=, max_val=, pattern=, preset=), the generated data will respect those constraints.
The schema object defining the structure and constraints of the data to generate. Each column can be specified using a field helper function (e.g., int_field(), string_field()) for fine-grained control, or as a simple dtype string (e.g., "Int64", "String") for unconstrained generation.
n:int=100
Number of rows to generate. The default is 100.
seed:int | None=None
Random seed for reproducibility. If provided, the same seed will produce the same data. Default is None (non-deterministic).
Output format for the generated data. Options are: (1) "polars" (the default) returns a Polars DataFrame, (2) "pandas" returns a Pandas DataFrame, and (3) "dict" returns a dictionary of lists.
country:str='US'
Country code for locale-aware generation when using presets. Accepts ISO 3166-1 alpha-2 codes (e.g., "US", "DE", "FR") or alpha-3 codes (e.g., "USA", "DEU", "FRA"). This affects the format and content of preset-generated data such as addresses, phone numbers, names, and postal codes. The default is "US".
Returns
DataFrame or dict
Generated data in the requested format.
Raises
:ValueError
If the schema has no columns or if constraints cannot be satisfied.
:ImportError
If required optional dependencies are not installed.
Presets and the country= Parameter
Several string_field() presets produce locale-aware data that varies depending on the country= parameter. The following presets are particularly affected:
Address-related presets ("address", "city", "state", "postcode", "phone_number", "latitude", "longitude"): produce addresses, cities, postal codes, and phone numbers formatted for the specified country. For example, country="DE" yields German street names and PLZ postal codes, while country="JP" yields Japanese addresses.
Person-related presets ("name", "name_full", "first_name", "last_name", "email", "user_name") produce culturally appropriate names for the specified country. For example, country="FR" produces French names, while country="KR" produces Korean names.
Financial presets ("iban", "ssn", "license_plate"): produce identifiers in the format used by the specified country.
When multiple columns in the same schema use related presets, the generated data is automatically coherent across those columns within each row. Person-related presets will share the same identity (e.g., the email is derived from the name), and address-related presets will share the same location (e.g., the city matches the address).
Supported Countries
The country= parameter currently supports 50 countries with full locale data:
Europe (32 countries): Austria ("AT"), Belgium ("BE"), Bulgaria ("BG"), Croatia ("HR"), Cyprus ("CY"), Czech Republic ("CZ"), Denmark ("DK"), Estonia ("EE"), Finland ("FI"), France ("FR"), Germany ("DE"), Greece ("GR"), Hungary ("HU"), Iceland ("IS"), Ireland ("IE"), Italy ("IT"), Latvia ("LV"), Lithuania ("LT"), Luxembourg ("LU"), Malta ("MT"), Netherlands ("NL"), Norway ("NO"), Poland ("PL"), Portugal ("PT"), Romania ("RO"), Russia ("RU"), Slovakia ("SK"), Slovenia ("SI"), Spain ("ES"), Sweden ("SE"), Switzerland ("CH"), United Kingdom ("GB")
Americas (7 countries): Argentina ("AR"), Brazil ("BR"), Canada ("CA"), Chile ("CL"), Colombia ("CO"), Mexico ("MX"), United States ("US")
Asia-Pacific (10 countries): Australia ("AU"), China ("CN"), Hong Kong ("HK"), India ("IN"), Indonesia ("ID"), Japan ("JP"), New Zealand ("NZ"), Philippines ("PH"), South Korea ("KR"), Taiwan ("TW")
Middle East (1 country): Turkey ("TR")
Examples
Here we define a schema with field constraints and generate test data from it: