 
    GentleRebuild is Powershell script for online index rebuild in
      high-load clustered MSSQL databases working 24/7 - Enterprise Edition
      is preferred, Standard Edition is supported with limitations.
    
    Detailed description: https://habr.com/ru/articles/761518/
    Script executes ALTER INDEX ... REBUILD/REORGANIZE in a
      safe controlled mode using 2 threads: one thread is executing the command
      while second one is checking the environment for several conditions. What
      is controlled:
    
      - High CPU
- Too active index rebuild activity can overflow the LDF file -
        transaction log backup could be not fast enough to flush the generated
        log to the disk
- High index rebuild activity can fill AlwaysOn queues (Send and
        Redo) above safe thresholds
- Or, despite the ONLINE option, rebuild can block other processes via
        schema locks - in such cases script "yields", stopping it work
        temporarily
- Controlled deadline time - deadline is a time when script should stop
        (not to interfere with the other important processes running on a fixed
        schedule)
- Exception list of tables and indexes - schemas named 'import', 'tmp',
        tables named 'tmp_%' etc.
- Indexes which can't be rebuilt with the options RESUMABLE=ON
        or even ONLINE=ON - such case indexes are skipped when size is
        above the defined thresholds
- Maximum duration of continuous index rebuild 
- Maximum size of indexes rebuild during a single run - in case the
        amount of total space for transaction log backups is limited or logs are
        used to roll over to STANDBY databases.
- Custom throttling conditions - review custom.ps1 and adjust
        for your needs throttling and lock sensitivity (if you download new
        version, don't overwrite your custom.ps1)
SQL server script
      
    
      - Execute script FRG_install.sql on a "DBA" database.
        In almost any environment there is dedicated "DBA" database - DBA,
        SYSSYS, Optimization, names differ. In the worst case, if there is no
        such database, use msdb.
- Execute FRG_FillUsageStats if needed (check changes in v1.37
        for more details)
- Execute procedure FRG_FillSizeStats to fill a table
        with object list and sizes. This procedure usually finishes it's work it
        few seconds. You can limit it to a singled database
- Execute procedure FRG_FillFragmentation to analyze a
        level of fragmentation. It could take up to a day or more to analyze
        30-100Tb (to limit the impact, check '2-server mode'), defult analysis
        mode is DETAILED. This is why powershell program doesn't analyze the
        fragmentation level during it's run, but relies on the previously
        collected information. First parameter of this stored procedure allows
        you to limit the analysis by a single database. By default all non
        system databases are analyzed.
- When script finishes the index rebuild, it updates the size and
        fragmentation info using the stored procedure FRG_FillFragmentationOne.
        Typically fragmentation level drops below the threshold, so on the next
        run an index won't be affected
- Powershell script relies on the static previously collected
        information, not on the actual fragmentation level NOW. (which can
        gradually deteriorate after an analysis). This is why steps 2 and 3
        should be executed again from time to time.
- Underlying tables keep all the history of sizes and fragmentation
        levels, the last value can be selected from a view FRG_last:
 
    Powershell Script parameters
      
    When you run powershell script, supply a configuration file name as first
      parameter, for example:
    powershell .\GentleRebuild.ps1 param
    where param.ps1 - is powershell files setting parameter variables inside.
      Check config_example.ps1
    All parameters in detail:
    
      - Where and what to rebuild:
 
          - server - server name (Integrated Security is
            used, if you do need SQL login edit the ps1 file)
- replicaserver - read only replica where fragmentation
            analysis is executed to limit the stress on the mains server, check
            '2-server mode'. if parameter is blank or the same as server, all
            queries run against the same server
- dbname - Database name to defragement. You can
            provide comma separated list. * means all non-system databases.
- workdb - DBA database (see above)
- threshold - fragmentation threshold in percent.
- extrafilter - Additional filter, which is applied
            against FRG_last view. Useful filter examples:
 
              - page_count>100 to skip tiny tables. Even after rebuild they
                could be too fragmented because they don't have enough pages to
                be completely populated
- TotalSpaceMb<1000000 - Skip huge indexes, they need extra
                care.
- exceptions by SchemaName, Tablename, IndexName etc
 
- Reorganize - 0 or 1. use 1 for INDEX REORGANIZE.
- AllowNonResumable - size in Gb. If an index can't
            be rebuilt using RESUMABLE=ON, an attempt is made to rebuild
            it with ONLINE=ON. However, in that case process can't be
            interrupted, and program can't yield to locks, unless killnonresumable=1
              is specified. This is why index should be small enough. It
            index size is bigger than that value it is skipped. Most HEAP
            indexes fall into that category.
- AllowNonOnline - size in Gb. Some indexes can't
            be rebuilt even using ONLINE=ON. Then they are rebuild
            OFFLINE, locking other processes, and this process is not
            interrupted unless killnonresumable=1 is specified. They
            should be smaller than a threshold, or they are skipped. Obviously AllowNonOnline
            should be even smaller than AllowNonResumable 
 
 
- How to rebuild
 
          - deadline - string in "HH:MM" format or an empty
            string. If current time is greater than deadline, tomorrow time is
            assumed. What deadline affects:
            
              - If deadline is reached, and there are still indexes to work
                on, process stops without starting a work on a next index
- If during a rebuild a projected time of completion (ETA) falls
                behind the deadline, and when work is between 2% and 3% (not far
                enough), then rebuild is terminated (using ABORT). However, it
                doesn't guarantee that work would always finish before the
                deadline, as process can slow down later. In such cases manual
                intervention is required (check "interactive menu")
- if harddeadline =1, index resumable rebuild is
                terminated at any stage, not only between 2% and 3%, to
                guarantee that there is no activity outside of the maintenance
                window
- If ETA is behind the deadline, the progress messages are
                highlighted with yellow to attract an attention.
 
- harddeadline - 0 (default). When 1, forces to abort
            operations no matter what progress is.
- starttime - string in "HH:MM" format or an empty string. If
            current time is greater than starttime, tomorrow time is assumed.
            Forces script to wait until that time is  reached - typically a
            maintenance window start.
- rebuildopt - rebuild options for ordinary
            indexes (CLUSTERED, NONCLUSTERED, HEAP). Don't use MAX_DURATION, use
            chunkminutes instead. Don't use SORT_IN_TEMPDB: for
            RESUMABLE it is not supported, for offline rebuilds there is a s
            special parameter sortintempdb. Regarding MAXDOP:
            
              - MAXDOP = 1 - slowly without much pressure on a server
- MAXDOP = 2 - normal work, recommended option
- MAXDOP = 4 - aggressive work
- It is not recommended to use MAXDOP>4, you will reach the
                AlwaysOn limitations, and the process will be throttled
- Note: for some indexes MAXDOP is ignored and server uses 1-2
                threads no matter what MADOP is.
 
- columnstoreopt - similar options for COLUMNSTORE
            indexes (COLUMNSTORE index support is experimental)
- reorganizeopt - options for INDEX REORGANIZE
- relaxation - number of seconds process sleeps
            when throttled by throttling conditions (LDF use size, AlwaysOn
            queues, locking)
- checkevery - check for locks every N seconds, default 15
- maxcpu - percentage of the CPU for the last 10 minutes,
            process is throttled if cpu is higher
- maxlogused - size in Mb used in log (LDF). When
            used space is greater than this value, script stops (throttles) for
            (relaxation) seconds and waits until the condition is cleared
            (transaction log backup carries away log). Value 0 means 'don't
            check'
- maxlogusedpct - Same but in percent to the total
            size of transaction log. Value 0 means 'don't check. Typically one
            of these 2 conditions is verified.
- maxqlen - Max size of AlwaysOn queues in Mb (log
            send queue + redo queue) for all databases. If value is above the
            threshold, process is throttled. To skip this check specify a very
            high value
- maxdailysize - maximum work size of indexes
            rebuilt in Mb. Condition is verified when work on a new index is
            started. 0 means 'don't check'
- chunkminutes - similar to MAX_DURATION, but is
            implemented using Powershell. 0 means no max duration, unless there
            are throttling conditions. Time is counted since last throttling.
- killnonresumable - when 1, operations without RESUMABLE=ON
            can be killed. You can increase values of AllowNonResumable
            and AllowNonOnline in hope that no locks will be
            detected during the operation (so value 1 is an optimistic
            strategy).
- forceoffline - Enterprise edition works like Standard one,
            all rebuilds are offline
- sortintempdb - for offline rebuilds only, size in Mb. if
            index is smaller than that size, SORT_IN_TEMPDB=ON is used. It index
            is too big (bigger than the value provided) it will be OFF.
            Resumable and online rebuilds can't use this option.
- offlineretries - when rebuild is not resumable and killnonresumable=1,
            index operation can be aborted when there are locks. Script will
            retry the operation soon after. But after the specified number of
            retries if would give up and skip this index
- orderby - order in which tables are processed, for more
            details check changes in v1.37
 
Interactive menu
    By pressing Ctrl/C while script is progressing (not throttling) you enter
      an interactive menu (might take up to 15 seconds for the menu to appear):
    
    For non-resumable index rebuilds, option S - STOP is not available
    For REORGANIZE, there is an extra option - R - stops and recalculates new
      stats, because on REORGANIZE already done, then process goes to the nest
      index in a list.
    It displays the current rebuild command and waits for a single character
      command. Don't leave it waiting for a command - it such state throttling
      conditions are not controlled. After command 'S' you can change the
      parameters and restart the script, it would continue from the same index.
      However MAXDOP can't be changed once rebuild is started.
    Additional comments:
      
    
      - Script starts from smaller indexes and is progressing towards the
        bigger ones
- Percentage is calculated based on the data in a table index_resumable_operations,
        if operation is with RESUMABLE option. Otherwise, there is a rough
        estimation based on the IO count vs page_count (rough estimation ... %).
        For COLUMNSTORE percentage is calculated based on the number of
        ROWSTOREs rebuilt.
- Sometimes after throttling the percentage doesn't increase and you
        will see ETA: (unknown). SQL server is seeking for a place from where to
        continue (rewinding effect). Sometimes it could take a long time 
- You can check a global progress on a console title
- ETA for the global progress is calculated when at least 1% of the
        total work is done
- Don't leave indexes in a suspended resumable state. SQL server still
        has to track all the changes and some queries can run slower.
- Indexes in a stopped resumable state can fail over to another server,
        and rebuild can continue on another server (!!!)
- To track changes for ONLINE indexes, SQL has to change execution plans
        which update these table. In most cases there is only mild performance
        degradation, but in few cases (with MERGE) the difference could be
        dramatic
- Sometimes index rebuild can slow down processes even on unrelated
        databases for unknown reasons. Throttling or  chunkminutes solves
        this problem. It is not known where the magic is, but looks like
        Microsoft knows something and had invented MAX_DURATION for that reason.
- If you chose chunkminutes too small the rebuild performance
        could suffer because of the "rewinding effect"
- COLUMNSTORE support is experimental.
      Standard Edition: tips and tricks
       
    
      - All throttling conditions are checked only between the operations,
        when a new operation starts. Exception: when other processes are locked
        and killnonresumable=1
- use killnonresumable=0 for "hard" defrag which ignores locks,
        when you have a maintenance window (use deadline parameter as
        well)
- you can benefit from setting sortintempdb value to make
        rebuild faster, also you can increase MAXDOP to reduce the probability
        of locks
- set offlineretries to skip "hot" indexes which can't be been
        rebuilt without causing too much trouble
- the most important is defining the custom procedure VictimClassifier,
        which allows you defide different categories for different connections,
        and for each category to define how much it can wait for locks (for
        example, job named XXX can wait for 1 hour, job YYY for 10 minutes,
        other processes are interactive and process should yield immediately)
- progress indicator is available for non-resumable operations - even it
        is not provided by SQL server, quite accurate estimation is done based
        on other system counters
      INDEX REORGANIZE: tips and tricks
       
    
      - this operation is resumable as well, it works on Standard Edition, but
        it is 3-5 time slows then INDEX REBUIILD on non-clustered indexes and up
        to 20x on CLUSTERED
- there is some estimation of the percentage of the work done for
        REORGANIZE, it is +/- 15% accurate in most cases
- sometimes the reorganized area hits the 'hot spot' where data is
        inserted on a systems under heavy load, and REORGANIZE never ends. For
        that reason if estimated percentage reaches 120%, process is terminated
        and statistics is recalculated.
- reorganize is throttled the same way other operations. It can cause
        locking - typically on schema locks.
Rebuild Log
      
    Is saved into a table FRG_Log
    
      Two server mode (taking benefits of the readable replica)
       
    Index analysis creates huge read stress on a server, reading all tables.
      If possible, it makes sense to move this process to a secondary replica.
      Simply provide replica server name in replicaserver parameter.
      Note: it is assumed that there is a DBA database there, and that
      database is read-write, so, it is not part of the AlwaysOn (or for that
      group replica server is primary). 
    You should run stress-full FRG_FillFragmentation there.
      After finishing each index rebuild, Powershell script will also run FRG_FillFragmentationOne
      on the secondary server to update fragmentation statistics. 
    Note: before running FRG_FillFragmentationOne it waits
      for the current LSN on the primary database to be replicated to the
      replica server.
    
      
    Change Log: 
    v1.10
    
      - Introducing custom throttle events. Works similar to all other
        throttling conditions (logs, locks, AlwaysOn). For an idea check
        function CustomThrottling
- New config parameter: killnonresumable: 0 (please add to
        parameter file). When 1, operations without RESUMABLE=ON can be killed.
        You can increase values of AllowNonResumable and AllowNonOnline
        in hope that no locks will be detected during the operation (so value 1
        is an optimistic strategy).
- Minor bug fix in FRG_FillFragmentationOne - please alter
        procedure from FRG_install.sql file
v1.11 
    
      - maxcpu parameter and throttling based on cpu for the last 10
        minutes.
- partial support of the Standard Edition - all operations are offline,
        throttling is between operations only, with the exception of having
        other processes locked. If killnonresumable=1, operation will be
        aborted but will be retried offlineretries times.
- using forceoffline=1 you can force Enterprise Edition to work
        as Standard one, all rebuilds will be offline
v1.12  
    
      - sortintempdb parameter uses SORT_IN_TEMPDB=ON if rebuild is
        offline (option is not compatible with online and resumable operations),
        and if index is smaller the value of sortintempdb (value in Mb).
        For example, value 100000 means that indexes smaller than 100Gb would
        use SORT_IN_TEMPDB=ON
- custom throttle events can now return values 1 (soft throttle, only
        between operations, so indexes are not left in resumable state) and 2 -
        hard, throttle immediately.
- custom throttle events and other throttling conditions affect offline
        rebuilds only between the operations, exception are locking.
- quite accurate progress percentage estimation for non-resumable
        operations
- sanity checks before run (that FRG_ tables exist and are populated)
 
    v1.13  
    
      - VictimClassifier is used extensively on Standard Edition.
        Kills rebuilds which lock other processes, when:
 
        - victim process waits longer than limit, given by VictimClassifier
- victim process wait time + projected (estimated) rebuild time to
          completion (based on percentage) will be longer than limit, given by
          VictimClassifier
- but (in both cases) the percentage done is not behind "point of no
          return", where rollback would take longer than completion. Depending
          on the actual MAXDOP, it is 75%-50% (the higher MAXDOP, the lower the
          point of no return)
v1.13  
    
      - 2 custom functions, CustomThrottling and VictimClassifier
        are extracted into a separate file custom.ps1
 
v1.2
      - Configuration file now is no longer JSON, but *.ps1. Reasons
 
        - I do need comments in configuration file
- I do need multi-line strings (here-strings) for the filters.
- YAML is not natively supported by Powershell
v1.23
      - new heuristics for INDEX REBUILD OFFLINE
 
    v1.26  
    
      - index reorganize now supported with progress estimation
- use option $reorganize = 1
v1.30   
    
      - multiple bug fixes and optimizations
- 2-server mode
v1.32   
    
      - FRG_last view now stores compression type. Please regenerate stats for
        new version! Underlying tables had changed! (but .ps1 code is compatible
        with the old ones)
- checkevery parameter - can check locks with any frequency, if
        15 seconds is too long.
- text log has the same name as config file
 
    
    v1.33  
    
      - Supports SAMPLED and LIMITED index analysis, default is still DETAILED
- FRG_FillSizeStats can be limited by a single database
v1.34
    
      - harddeadline = 1 forces to stop at deadline no matter what
- starttime = wait for start time (used to delay for maintenance
        window in interactive mode)
v1.35v1.36   
    
      - Added FileGroupName to FRG_last and underlying views, so script can
        filter by Filegroup. You need to recreate script tables
        (uninstall/install)
v1.37   
    
      - new parameter, orderby to control the processing order
- "TotalSpaceMb"  default - by size from the smallest
- "frag_pct desc"  from the worst
- "frag_count desc"  from highest entropy
- "activity desc"  most active first
- "activity*frag_count desc"  extra workload for read access
- "activity*(100-density) desc"  potential space benefit
        after rebuild
Note: new column activity is calculated based on the index usage
      statistics from sys.dm_db_index_usage_stats, value is calculated
      as sum of seeks and lookups and the number of scans, multiplied by table
      size. Value is populated by procedure FRG_FillUsageStats which
      must be called before FRG_FillFragmentation.
    On two server configuration, where activity happens on the primary node,
      but fragmentation analysis is done on the secondary node, information from
      dm_db_index_usage_stats can be transferred using procedure FRG_PrintUsageStats
    
    
    Check other projects: https://www.actionatdistance.com/