Datasets - Running AI tasks on files
Promptloop streamlines the process of applying AI models to your data. We do this with datasets. Upload a CSV or Excel file, select relevant inputs for the model to use, and launch AI tasks on hundreds to thousands of rows.
Datasets are where you will store, edit, and view data that you are processing with AI tasks.
Helpful Links
Getting Started
This video overview provides a guide of how to get started and take full advantage of datasets.
Files and Versions Datasets are where you can upload the data you are working with on the PromptLoop Platform.
Uploading You can upload data to launch a job, using an existing AI task, or upload a dataset to filter and analyze first before running.
Generating Datasets If you don't have any data to start, you can use our crawlers to pull back thousands of companies that meet your criteria. This is a great place to start if
Run an AI task on each row When you run an AI task on a dataset, the job will run in the background and save as a new version in the dataset when its complete. This will allow you to run tens of thousands of operations at once and freeing you up for other work.
If you have questions about limits, or have a task that requires a large amount of rows, reach out to the team here.
Reliable and Scalable Datasets are an extremely powerful feature available to all users and one of the core capabilities of how we think about leveraging AI models efficiently.
How to run an AI task on a spreadsheet file upload
Step 1: Upload a file with data
Datasets let you use any data table in an Excel or CSV file. Our systems automatically detect columns and let you select which you want to use.
- You can select the columns that you will use as inputs for the task - this is often a single column like a website or search term
- Results from your task - new columns or rows - will be added into the uploaded sheet and available to download as a new file
Step 2: Launch the Job
Select your task and launch the job with the correct input columns. Jobs are immediately added to the queue based on your account tier and capacity. You will see progress and results once the task is running. Even extremely large jobs usually complete within 90 minutes.
If you do not already have an AI task created, use the editor to create one or copy a template to edit from the template library. The task creation tool guides will help you get started.
Step 3: View results
Results are saved as a new dataset version for review. You can then search and filter data before exporting and using the results of the task. You can also run data on another task right from the datasets page.

For help setting up your first datasets, or questions about capacity and running large files, reach out the the team or book time with us to let us help you. Book a Session
Dataset Tools
Generating a Dataset

Dataset generation utilizes our web crawlers to crawl millions of businesses to construct relevant datasets from scratch for you to identify high quality sales targets. This is split into two categories now: Geographic Crawl and Global Crawl.
Geographic Crawl - This takes in geographic constraints (I.e New York State, Atlanta, or the South East US) as well as a keyword (I.e Auto repair shops, hotels, coffee shops) and returns a deduped list of all businesses in that region that match those keywords. This crawls tens of millions of businesses with physical addresses and is great starting point that you can then run more custom tasks on.
Global Crawl - This takes in business specific filters (I.e B2B, Healthcare, Headquartered in the US) and uses our crawlers to return a list of thousands of businesses matching that criteria and their websites. We pull all of the information directly from their website, so it is as up - to - date as possible.
Dataset Generation is limited based on your account tier and set up. If you would like assistance increasing limits please contact your admin or book a call with the team here.
How to generate a dataset
Step 1

Navigate to the Datasets page and select Generate Dataset
Step 2

Choose your option.
Step 3 - Geographic Crawl

Select the region(s), city(s), and or specific zipcodes you want to target and put in a keyword to search for. You should only ever provide one keyword type per generation (e.g don't put: 'hotels, restaurants, and schools') as the crawlers will look for businesses that match all of them. Instead if you need coverage of all three, run three separate generations. We recommend starting with a small subset of the geography to confirm the keyword is returning the correct type of businesses.
Step 4

Once you click generate, you will see a dataset row show up as the crawler is working. It will automatically update once it is completed and usually takes a couple of minutes.
Step 5

Once generated, the dataset can be used the same as any other and you can run additional enrichment tasks as needed!
New Column Gen
For any dataset on PromptLoop, you can take advantage of the new column generation feature to quickly format, clean up, or edit the output of a task or change any column.

New Column Gen is available for free on all datasets with all team and enterprise plans. You can use this for quick and reliable reformatting right in the PromptLoop platform
Just like an Excel function, you can select the existing columns that you want to use as input and context and provide instructions for what you want to accomplish.

When you are creating your prompt you can run a quick preview on random rows. This allows you to dial in the final results and iterate quickly.

This then runs on all rows in your dataset automatically when you click Generate. It runs much faster than a normal task, but allow about 10 minutes per 20k rows of data. The completed file with the new generated column will be automatically added as a new version when it completes.