What’s the difference between Files and Filegroups?
2March 28, 2018 by Kenneth Fisher
tl;dr; Filegroups are a logical construct used to separate tables and indexes from each other, files are the physical construct used to store information (log & data) about the database.
When creating a database you’ll notice that each database is built of several components. There are two files, one for data and one for the log, the MDF and LDF files. Then there is the default filegroup PRIMARY. But that’s all just the default.
Filegroups
Filegroups are a logical construct for the data in a database. They are there so that, if you want, you can separate tables and indexes from each other. If you look at partitioning you will almost certainly learn something about filegroups, because part of the construction of partitions is the option to put each partition into its own filegroup. In fact, since even non-partitioned tables/indexes are essentially single partition tables/indexes every table or index can be placed into a specific filegroup. The filegroup can be specified when creating a table or index or it will use the default (usually PRIMARY). So, if filegroups are a logical construct, what’s is physical representation?
Files
Files are the actual location where the data is stored. There are two types of files, transaction log files and data files.
Transaction log files
The transaction log is part of what allows the database to be ACID compliant. For those that don’t know, that means transactions are Atomic, Consistent, Isolated and Durable. I’ll let you follow the link for the exact definition of each as it’s outside of the scope of this discussion. However, the log is there, basically, to allow us to roll back a transaction until it’s been committed, and to restore transactions between full backups. While you can have multiple log files there aren’t very many good reasons to. It’s generally best practice to only have one transaction log file per database. Transaction log files are not part of filegroups.
Data files
Data files, well, they contain the data. They are the physical representation of a filegroup and each filegroup must have at least one file. They can, however, have more than one file. Usually, the additional files (and by additional I mean anything other than the first file in the PRIMARY filegroup) have the NDF extension.
Now, you may be wondering, if I have two files in my filegroup but I want to put a table into just one of those files, how do I do it? The answer is: Use a second filegroup. The data is stored in the files of a filegroup using a proportional fill. Meaning that SQL will write the data based on how much free space is available in each of the files.
Summary
Files and filegroups are not the same. Files are physical, filegroups are logical.
Also please note, I’ve completely ignored things like filestream and memory optimized filegroups. If you want to read about this subject in far more detail here is the BOL page for Files and Filegroups.
When the data need to be spread to all data files, for example if is a row that is inserted, the row will be spread across to all data files or one datafile only?, I mean if at the moment to store, a single data row is stored in one data file or in several datafiles?
I’m not 100% certain (although I may do some testing later) but if I had to guess it’s by extent. I’d bet there is some algorithm that tells it which file to add a new extent to and that’s how the data is split between the files. It just wouldn’t make sense for it to be anything smaller than that given the way the file internals are handled.