Tools
Tools: Useful Linux Commands For Data Engineers
2026-01-26
0 views
admin
Prerequisites ## Table of Contents ## File and Directory Operations ## File System and Storage Management ## File Attributes and Permissions ## User and Group Management ## Networking and Security ## File Compression and Encryption ## Editors ## File transfer commands ## Conclusion ## References Linux servers dominate as the most preferred compute environment for large scale data systems. Mastering Linux helps data engineers to efficiently manage data pipelines and process data. This article is a deep dive into the most useful Linux commands that are relevant for data engineering tasks. These commands are used for navigating and manipulating the Linux file system. They emulate CRUD operations, but on the file structure. Data engineers can use these commands to inspect logs and raw data dumps. The file system partitions, organizes and stores data on disk. cat /proc/partitions ~ view all storage disks and partitions recognized by the system
fdisk /dev/sda ~ create, manage and delete partitions on /dev/sda
sudo mkfs.xfs /dev/sdb1 ~ install a filesystem on partition /dev/sdb1
sudo mount /dev/sdb1 /myfolder ~ attach storage space to myfolder directory which acts as the mount point df -f /myfolder ~ verify the storage space exists
pvcreate /dev/sdb1 ~ initialize a physical volume to use with the Logical Volume Manager As data engineers work on systems with intensive data, they need to skillfully manipulate partitions and filesystems to prevent losing data. Backing up data is recommended before running these commands. Linux file permissions offer a security mechanism for determining who can read, write or execute files on a system. Using these commands, a data engineer will be able restrict access to sensitive file data and protect files from accidental modifications by setting strict permissions. Use the useradd command to create a new user View user information with To add a password for the new user, run: All existing users will be listed in the /etc/passwd file Switch to a different user with In order for data engineers to maintain least privilege access to resources, they need to properly implement user and group management. It is recommended to use user accounts instead of the root user to minimize access. At the center of Linux are four key components that block malicious access - Firewalls filter traffic, Encryption encrypts data in transit, Authentication verifies user identities, Monitoring analyzes traffic. Using UFW which is a firewall interface for iptables, you can check the currently registered profiles with To enable an application profile, RUN There is a rule that allows you to specify the port instead of the profile name With the rules applied you can enable firewall with: use sudo ss --tulnp|grep to show listening ports and ping to test connectivity. Compressing a file reduces the amount of storage needed and will help speed up data transfer. Use gzip to compress a single file as: The process of archiving will combine multiple files into a single file archive. create a simple archive using tar package To extract an archive use the command: Encryption will ensure that you can safely transit the data over the internet. Use GPG or openssl for encryption Data engineers can compress raw log files to save space, then encrypt data archived files before uploading to cloud storage. The most popular text editors on Linux are Nano and Vim. Both commands will create if not exists, then open file.txt These commands can be used to handle file transfer either locally or remotely. A data engineer will be able to efficiently sync files on the local server to a remote server. This article covers several categories of Linux commands and how data engineers use them. How to Manage Linux Storage
Linux file permissions explained
Differences between archiving and compression Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
# returns the current working directory
pwd # lists directory contents including files and subdirectories
ls # includes hidden files in the list
ls -a
# changes the current working directory // cd /path/to/directory
cd # deleting files and directories rm
# creating a file
touch
# returns the contents of a file cat # quickly view the first or last lines of a file without opening the entire file.
head / tail Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# returns the current working directory
pwd # lists directory contents including files and subdirectories
ls # includes hidden files in the list
ls -a
# changes the current working directory // cd /path/to/directory
cd # deleting files and directories rm
# creating a file
touch
# returns the contents of a file cat # quickly view the first or last lines of a file without opening the entire file.
head / tail COMMAND_BLOCK:
# returns the current working directory
pwd # lists directory contents including files and subdirectories
ls # includes hidden files in the list
ls -a
# changes the current working directory // cd /path/to/directory
cd # deleting files and directories rm
# creating a file
touch
# returns the contents of a file cat # quickly view the first or last lines of a file without opening the entire file.
head / tail CODE_BLOCK:
/dev ~ directory representing actual storage disks
sd ~ the storage disk
a ~ the first disk 1 ~ first partition on the disk Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
/dev ~ directory representing actual storage disks
sd ~ the storage disk
a ~ the first disk 1 ~ first partition on the disk CODE_BLOCK:
/dev ~ directory representing actual storage disks
sd ~ the storage disk
a ~ the first disk 1 ~ first partition on the disk COMMAND_BLOCK:
# returns file metadata including file permissions
ls -l # makes a file immutable
chattr +i # list file attributes
lsattr # add or remove execute permissions
chmod +x or -x # set Read/Write for owner, read for group/others
chmod 644 myapp.py
# change file ownership chown Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# returns file metadata including file permissions
ls -l # makes a file immutable
chattr +i # list file attributes
lsattr # add or remove execute permissions
chmod +x or -x # set Read/Write for owner, read for group/others
chmod 644 myapp.py
# change file ownership chown COMMAND_BLOCK:
# returns file metadata including file permissions
ls -l # makes a file immutable
chattr +i # list file attributes
lsattr # add or remove execute permissions
chmod +x or -x # set Read/Write for owner, read for group/others
chmod 644 myapp.py
# change file ownership chown COMMAND_BLOCK:
sudo useradd username #create new group
sudo groupadd groupname # -m ~creates the home directory for the user
sudo useradd -m username # assign the user to a specific group
sudo useradd -G groupname $username Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
sudo useradd username #create new group
sudo groupadd groupname # -m ~creates the home directory for the user
sudo useradd -m username # assign the user to a specific group
sudo useradd -G groupname $username COMMAND_BLOCK:
sudo useradd username #create new group
sudo groupadd groupname # -m ~creates the home directory for the user
sudo useradd -m username # assign the user to a specific group
sudo useradd -G groupname $username COMMAND_BLOCK:
id username
# returns UID and GID Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
id username
# returns UID and GID COMMAND_BLOCK:
id username
# returns UID and GID COMMAND_BLOCK:
sudo passwd username Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
sudo passwd username COMMAND_BLOCK:
sudo passwd username CODE_BLOCK:
cat /etc/passwd Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
cat /etc/passwd CODE_BLOCK:
cat /etc/passwd CODE_BLOCK:
su -h username Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
su -h username CODE_BLOCK:
su -h username COMMAND_BLOCK:
sudo ufw app list Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
sudo ufw app list COMMAND_BLOCK:
sudo ufw app list COMMAND_BLOCK:
sudo ufw allow appname Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
sudo ufw allow appname COMMAND_BLOCK:
sudo ufw allow appname COMMAND_BLOCK:
# check if ufw is running
sudo ufw status # allow ssh
sudo ufw allow 22 # specify port ranges (apply with specific protocol)
sudo ufw allow 6000:6009 /tcp # allow connections from IP address
sudo ufw allow from 201.8.139.4 Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# check if ufw is running
sudo ufw status # allow ssh
sudo ufw allow 22 # specify port ranges (apply with specific protocol)
sudo ufw allow 6000:6009 /tcp # allow connections from IP address
sudo ufw allow from 201.8.139.4 COMMAND_BLOCK:
# check if ufw is running
sudo ufw status # allow ssh
sudo ufw allow 22 # specify port ranges (apply with specific protocol)
sudo ufw allow 6000:6009 /tcp # allow connections from IP address
sudo ufw allow from 201.8.139.4 COMMAND_BLOCK:
sudo ufw enable Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
sudo ufw enable COMMAND_BLOCK:
sudo ufw enable COMMAND_BLOCK:
# results in filename.gz
gzip filename Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# results in filename.gz
gzip filename COMMAND_BLOCK:
# results in filename.gz
gzip filename COMMAND_BLOCK:
# -cf creates and names the archive
tar -cf myarchive.tar app.py main.py project/ Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# -cf creates and names the archive
tar -cf myarchive.tar app.py main.py project/ COMMAND_BLOCK:
# -cf creates and names the archive
tar -cf myarchive.tar app.py main.py project/ CODE_BLOCK:
tar -xf myarchive.tar Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
tar -xf myarchive.tar CODE_BLOCK:
tar -xf myarchive.tar COMMAND_BLOCK:
# File encrypted using passphrase
gpg -c filename.txt # AES encryption
openssl enc -aes-256-cbc -in data.txt -out data.enc Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# File encrypted using passphrase
gpg -c filename.txt # AES encryption
openssl enc -aes-256-cbc -in data.txt -out data.enc COMMAND_BLOCK:
# File encrypted using passphrase
gpg -c filename.txt # AES encryption
openssl enc -aes-256-cbc -in data.txt -out data.enc CODE_BLOCK:
nano file.txt vi file.txt Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
nano file.txt vi file.txt CODE_BLOCK:
nano file.txt vi file.txt COMMAND_BLOCK:
# Secure File Transfer Protocol ~ used for transferring files
sftp username@ip_address/hostname # copy files and directories locally
cp file1.txt /backup/file1.txt # move or rename files
mv filename/ renamedfile/ # transfer files remotely over ssh
scp data.csv username@ip_address:/data #synchronize directories remotely for incremental updates
rsync -avz /backup/log username@ip_address/hostname Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Secure File Transfer Protocol ~ used for transferring files
sftp username@ip_address/hostname # copy files and directories locally
cp file1.txt /backup/file1.txt # move or rename files
mv filename/ renamedfile/ # transfer files remotely over ssh
scp data.csv username@ip_address:/data #synchronize directories remotely for incremental updates
rsync -avz /backup/log username@ip_address/hostname COMMAND_BLOCK:
# Secure File Transfer Protocol ~ used for transferring files
sftp username@ip_address/hostname # copy files and directories locally
cp file1.txt /backup/file1.txt # move or rename files
mv filename/ renamedfile/ # transfer files remotely over ssh
scp data.csv username@ip_address:/data #synchronize directories remotely for incremental updates
rsync -avz /backup/log username@ip_address/hostname - Setup a Linux server environment for testing purposes.
- Should be familiar with the command line - File and Directory Operations
- File System and Storage Management
- File Attributes and Permissions
- User and Group Management
- Networking and Security
- File Compression and Encryption
- File transfer commands
how-totutorialguidedev.toailinuxservernetworknetworkingfirewalliptablesswitchssl