Tools

Tools: Latest: Fast Fragment Scanning for Tables in GBase 8a

2026-04-29 0 views admin

Why Fast Fragment Detection Matters

How It Works

Deploying and Running the Script

1. Installation

2. Configuration

3. Execution

Full Script Reference

Why This Works Disk fragmentation inside a database can silently eat up storage and degrade I/O performance. In GBase 8a, the China-domestically developed MPP cluster database from GBASE, frequent inserts and deletes leave "holes" in table files over time. Spotting these fragmented tables quickly is a common operational challenge—traditional methods that scan metadata tables are slow and don't scale well across large clusters. This article introduces a lightweight, shell-based approach that uses the Linux filefrag utility to scan for fragmented tables directly from the filesystem layer, significantly reducing detection time in a gbase database environment. GBase 8a's data files can become heavily fragmented under write-intensive workloads. Fragmentation not only wastes disk space but also adds random I/O overhead during scans. Conventional approaches that query internal metadata tables struggle with performance at scale. By contrast, filefrag reads filesystem-level extent maps, making it orders of magnitude faster for bulk scanning. The script iterates through all user tables (obtained from gbase.table_distribution), locates the data directory for a specific shard suffix (e.g., n1), and runs filefrag -v against each file. It extracts the number of extents (contiguous disk regions) per file and calculates the average. If a table's average extent count exceeds the threshold (default: 2), the table is flagged as fragmented. Place the sweep.sh script under /home/gbase/sweep on a composite node. Adjust the variables at the top of the script: After the scan finishes, the /home/gbase/sweep/log/ directory will contain one file per flagged table, named database_name.table_name. These files serve as a direct to-do list for defragmentation operations. The key insight is offloading the work to the filesystem layer. GBase 8a's gbase.table_distribution system table gives the authoritative list of table names, and filefrag handles the low-level extent counting without touching the gbase database engine. The script then parallelizes the scanning using a named-pipe-based concurrency model, making it practical to run even across hundreds of tables. GBASE continues to provide robust tools for managing GBase 8a clusters, and combining OS-level utilities with database metadata is a pattern that can be extended to other operational tasks. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

gbase.table_distribution filefrag -v /home/gbase/sweep threads_num sh sweep.sh sh sweep.sh sh sweep.sh /home/gbase/sweep/log/ database_name.table_name #!/bin/bash # author: Bryan zhao # date: 2025-07-29 user="gbase" passwd="******" threads_num=2 # Concurrency suffix="n1" # Typically scan one shard only avg_sum=2 # Mark table if avg extents exceed this > tb.list if [ ! -d "`pwd`/log" ]; then mkdir -p "`pwd`/log" else rm -rf "`pwd`/log" mkdir -p "`pwd`/log" fi cli="${GCLUSTER_HOME}/bin/gbase -u${user} -p${passwd} -N" $cli -e "SELECT dbname,tbname FROM gbase.table_distribution where dbname not in ('information_schema','performance_schema','gbase','gctmpdb','gclusterdb')" > tb.list child() { local dbname=$1 tbname=$2 suffix=$3 avg_sum=$4 local mulu="${GBASE_BASE}/userdata/gbase/${dbname}/sys_tablespace/${tbname}_${suffix}" [ ! -d "$mulu" ] && { echo "Error: Directory $mulu not found"; return 1; } local avg_extents=$(find "$mulu" -type f -exec filefrag -v {} \; 2>/dev/null | grep "extent" | awk '{sum+=$2} END {print (NR>0)? sum/NR : 0}') if (( $(echo "$avg_extents > $avg_sum" | bc -l) )); then touch "`pwd`/log/${dbname}.${tbname}" fi } tmp_fifofile="/tmp/$$_`date +%N`.fifo" mkfifo $tmp_fifofile exec 6<>$tmp_fifofile rm -rf $tmp_fifofile for i in `seq 1 ${threads_num}` do echo >&6 done while read -r db table do read -u6 { child ${db} ${table} ${suffix} ${avg_sum} echo >&6 }& done < tb.list wait exec 6>&- #!/bin/bash # author: Bryan zhao # date: 2025-07-29 user="gbase" passwd="******" threads_num=2 # Concurrency suffix="n1" # Typically scan one shard only avg_sum=2 # Mark table if avg extents exceed this > tb.list if [ ! -d "`pwd`/log" ]; then mkdir -p "`pwd`/log" else rm -rf "`pwd`/log" mkdir -p "`pwd`/log" fi cli="${GCLUSTER_HOME}/bin/gbase -u${user} -p${passwd} -N" $cli -e "SELECT dbname,tbname FROM gbase.table_distribution where dbname not in ('information_schema','performance_schema','gbase','gctmpdb','gclusterdb')" > tb.list child() { local dbname=$1 tbname=$2 suffix=$3 avg_sum=$4 local mulu="${GBASE_BASE}/userdata/gbase/${dbname}/sys_tablespace/${tbname}_${suffix}" [ ! -d "$mulu" ] && { echo "Error: Directory $mulu not found"; return 1; } local avg_extents=$(find "$mulu" -type f -exec filefrag -v {} \; 2>/dev/null | grep "extent" | awk '{sum+=$2} END {print (NR>0)? sum/NR : 0}') if (( $(echo "$avg_extents > $avg_sum" | bc -l) )); then touch "`pwd`/log/${dbname}.${tbname}" fi } tmp_fifofile="/tmp/$$_`date +%N`.fifo" mkfifo $tmp_fifofile exec 6<>$tmp_fifofile rm -rf $tmp_fifofile for i in `seq 1 ${threads_num}` do echo >&6 done while read -r db table do read -u6 { child ${db} ${table} ${suffix} ${avg_sum} echo >&6 }& done < tb.list wait exec 6>&- #!/bin/bash # author: Bryan zhao # date: 2025-07-29 user="gbase" passwd="******" threads_num=2 # Concurrency suffix="n1" # Typically scan one shard only avg_sum=2 # Mark table if avg extents exceed this > tb.list if [ ! -d "`pwd`/log" ]; then mkdir -p "`pwd`/log" else rm -rf "`pwd`/log" mkdir -p "`pwd`/log" fi cli="${GCLUSTER_HOME}/bin/gbase -u${user} -p${passwd} -N" $cli -e "SELECT dbname,tbname FROM gbase.table_distribution where dbname not in ('information_schema','performance_schema','gbase','gctmpdb','gclusterdb')" > tb.list child() { local dbname=$1 tbname=$2 suffix=$3 avg_sum=$4 local mulu="${GBASE_BASE}/userdata/gbase/${dbname}/sys_tablespace/${tbname}_${suffix}" [ ! -d "$mulu" ] && { echo "Error: Directory $mulu not found"; return 1; } local avg_extents=$(find "$mulu" -type f -exec filefrag -v {} \; 2>/dev/null | grep "extent" | awk '{sum+=$2} END {print (NR>0)? sum/NR : 0}') if (( $(echo "$avg_extents > $avg_sum" | bc -l) )); then touch "`pwd`/log/${dbname}.${tbname}" fi } tmp_fifofile="/tmp/$$_`date +%N`.fifo" mkfifo $tmp_fifofile exec 6<>$tmp_fifofile rm -rf $tmp_fifofile for i in `seq 1 ${threads_num}` do echo >&6 done while read -r db table do read -u6 { child ${db} ${table} ${suffix} ${avg_sum} echo >&6 }& done < tb.list wait exec 6>&- gbase.table_distribution - user — database user (typically gbase) - passwd — corresponding password - threads_num — number of parallel scanning threads (default: 2) - suffix — shard suffix to scan (usually n1) - avg_sum — average extent threshold to trigger a flag (default: 2)

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolslatestfragmentscanningtablesgbase

More from Tools

Tools: Update: Troubleshooting Datacenter: Resolving Apache Service Reachability On Server

2026-04-29 0

Tools: Breaking: Bash got sharper today.

2026-04-29 0

Tools: Como restaurar chaves SSH

2026-04-29 0

Tools: Update: Setting up the perfect Claude Code environment in 2026

2026-04-29 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Latest: Fast Fragment Scanning for Tables in GBase 8a

Why Fast Fragment Detection Matters

How It Works

Deploying and Running the Script

1. Installation

2. Configuration

3. Execution

Full Script Reference

🏷️ Tags

More from Tools

Tools: Update: Troubleshooting Datacenter: Resolving Apache Service Reachability On Server

Tools: Breaking: Bash got sharper today.

Tools: Como restaurar chaves SSH

Tools: Update: Setting up the perfect Claude Code environment in 2026

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting