Generate a random number of children for each parent

4

October 3, 2018 by Kenneth Fisher

I was asked an interesting question the other day.

Is it possible to get a different random number of rows back from each application of a cross apply?

The purpose is to create some random demo/testing information. This is one of those cases where an example may be needed for this to make sense. So, let’s say we have a group of students each of whom will have signed up for some classes. In order to create some test data, we want each student to be assigned to a different random number of classes.

CREATE TABLE Students (
	StudentId INT NOT NULL IDENTITY(1,1),
	FirstName varchar(50),
	LastName varchar(50)
	);

CREATE TABLE Classes (
	ClassId INT NOT NULL IDENTITY(1,1),
	Name varchar(50)
	);

INSERT INTO Students VALUES
	('Bob','Smith'),
	('Joe','Jones'),
	('Chris','Cross'),
	('Amy','Fisher'),
	('Barbara','Marshal')

INSERT INTO Classes VALUES
	('Math 101'), ('English 101'), ('Spanish 101'),
	('Theater 101'), ('Music 101'), ('Robotics 101'),
	('History 101'), ('Biology 101'), ('Programming 101'),
	('Math 201'), ('English 201'), ('Spanish 201'),
	('Theater 201'), ('Music 201'), ('Robotics 201'),
	('History 201'), ('Biology 201'), ('Programming 201');

So what I want to get, is a random selection of classes for each student. And not just a random set of values, but a random number of them. To start with I’ll be using TOP and (ABS(CHECKSUM(NEWID()) % 5)) to generate a random number number of rows. I’m also using CROSS APPLY because that will call the subquery once for each row returned by the outer query. At least that’s the way I understand it.

SELECT Students.FirstName, Students.LastName, Classes.Name
FROM Students
CROSS APPLY (SELECT TOP (ABS(CHECKSUM(NEWID()) % 4) + 1) * 
			FROM Classes) Classes
ORDER BY Students.FirstName, Students.LastName, Classes.Name;

So far so good. Unfortunately this way everyone ended up with Math 101 since, even though there isn’t an order specified it’s still most likely to pull in the order the rows were inserted. So let’s try throwing a ORDER BY NEWID() to get a random order of the rows as well.

SELECT Students.FirstName, Students.LastName, Classes.Name
FROM Students
CROSS APPLY (SELECT TOP (ABS(CHECKSUM(NEWID()) % 4) + 1) * 
			FROM Classes ORDER BY NEWID()) Classes
ORDER BY Students.FirstName, Students.LastName, Classes.Name;

Well, better, but now every student is getting the same number of rows and the same classes. But at least it’s a different set of classes each time I run the query. Off guess it’s because I’ve got NEWID() in the subquery twice now, but I can’t be sure. What I did notice though is that if I make the subquery correlated (use some value from the outer query) it fixed it.

SELECT Students.FirstName, Students.LastName, Classes.Name
FROM Students
CROSS APPLY (SELECT TOP (ABS(CHECKSUM(NEWID()) % 4) + 1) * 
			FROM Classes ORDER BY NEWID(), Students.FirstName) Classes
ORDER BY Students.FirstName, Students.LastName, Classes.Name;

And now I get a random set of classes for each student. Probably not something I’ll have to do very often but it did make for an interesting exercise.

4 thoughts on “Generate a random number of children for each parent

  1. Mike Stuart says:

    Kenneth, thanks for the great post. I was stepping through your code and ran the first SELECT statement after loading the tables.

    SELECT Students.FirstName, Students.LastName, Classes.Name FROM Students
    CROSS APPLY (SELECT TOP (ABS(CHECKSUM(NEWID()) % 4) + 1) * FROM Classes) Classes
    ORDER BY Students.FirstName, Students.LastName, Classes.Name;

    You indicated that you ended up with all students having the same class of ‘Math 101’. My results were quite different; as I reran the code I would end up with a different number of students, with different classes for each student.

    My environment: SELECT @@VERSION =
    Microsoft SQL Server 2016 (RTM-GDR) (KB4019088) – 13.0.1742.0 (X64) Jul 5 2017 23:41:17 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows 10 Pro 6.3 (Build 17134: )

    Any ideas or suggestions? TIA, and thanks again for the great posts

    • Thanks 🙂 The only thing I can think of is that the CROSS APPLY is running each time and occasionally returning no rows. That would cause it to exclude students. You could check by changing it to an OUTER APPLY. Not sure why it’s doing that though to be honest.

  2. Chad says:

    A co-worker sent me this article. I thought it was pretty cool, but thought I would comment because you ran into a pretty odd issue I’ve also run into in the past as well.

    You already figured it out, but if the query in an APPLY is not correlated, then sql only runs it one time and applies those results to every row. When I ran into it, I solved it by using a WHERE clause in the cross apply. I added something similar to:

    WHERE Students.StudentID = Students.StudentID

    But adding it to the order by is much cleaner, I like that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 3,755 other subscribers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: