Django 面試精華：threading.local 原理

2025-01-14 約 2000 字預計閱讀 10 分鐘

前言

在多線程環境中，如何讓每個線程擁有自己獨立的資料副本？這是一個經典的並發編程問題。

想像一個實際場景：

# ❌ 全局變量會被所有線程共享
current_user = None

def process_request(request):
    global current_user
    current_user = request.user  # 線程 A 設置

    do_some_work()  # 在這期間，線程 B 可能修改 current_user！

    send_email(current_user)  # 可能發給錯誤的用戶！

問題：

線程 A 處理用戶 Alice 的請求
線程 B 同時處理用戶 Bob 的請求
兩個線程共享同一個 current_user 變量
結果：Alice 的郵件可能發給 Bob！

解決方案：使用 threading.local

# ✅ 每個線程有自己的 current_user
thread_local = threading.local()

def process_request(request):
    thread_local.current_user = request.user  # 線程隔離

    do_some_work()  # 其他線程不會影響

    send_email(thread_local.current_user)  # 安全！

這篇文章將深入探討 threading.local 的實現原理，以及 Django 如何利用它保證線程安全。

1. 什麼是 Thread Local Storage（TLS）

1.1 概念

Thread Local Storage（線程本地存儲，TLS） 是一種讓每個線程擁有自己獨立變量副本的機制。

特性：

✅ 線程隔離：每個線程看到的是自己的資料
✅ 自動管理：線程結束時自動清理
✅ 透明訪問：使用方式與普通變量相同

1.2 使用場景

場景	說明	範例
資料庫連接	每個線程維護獨立連接	Django ORM 連接池
請求上下文	存儲當前請求的資訊	Flask 的 `request` 對象
用戶會話	保存當前用戶資訊	認證系統
事務管理	追蹤事務狀態	Django `transaction.atomic`
日誌上下文	添加線程特定的日誌資訊	請求 ID、用戶 ID

2. threading.local 的內部實現

2.1 基本原理

threading.local 的核心思想：使用線程 ID 作為 key，存儲每個線程的資料。

簡化實現：

import threading

class SimpleLocal:
    def __init__(self):
        # 使用字典存儲，key 是線程 ID
        self._storage = {}

    def _get_thread_id(self):
        return threading.current_thread().ident

    def __setattr__(self, name, value):
        if name == '_storage':
            # 初始化 _storage 本身
            object.__setattr__(self, name, value)
        else:
            # 存儲到當前線程的空間
            thread_id = self._get_thread_id()
            if thread_id not in self._storage:
                self._storage[thread_id] = {}
            self._storage[thread_id][name] = value

    def __getattr__(self, name):
        thread_id = self._get_thread_id()
        if thread_id in self._storage:
            return self._storage[thread_id].get(name)
        raise AttributeError(f"No such attribute: {name}")


# 使用示例
local_data = SimpleLocal()

def worker(name):
    # 每個線程設置自己的變量
    local_data.user = name
    local_data.count = 0

    for i in range(3):
        local_data.count += 1
        print(f"線程 {name}: count = {local_data.count}")

# 啟動多個線程
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    executor.map(worker, ['Alice', 'Bob', 'Charlie'])

# 輸出：
# 線程 Alice: count = 1
# 線程 Bob: count = 1
# 線程 Charlie: count = 1
# 線程 Alice: count = 2
# 線程 Bob: count = 2
# 線程 Charlie: count = 2
# 線程 Alice: count = 3
# 線程 Bob: count = 3
# 線程 Charlie: count = 3

內部結構可視化：

SimpleLocal 對象
│
└─ _storage (字典)
   ├─ 12345 (線程 A ID)
   │  ├─ user = "Alice"
   │  └─ count = 3
   │
   ├─ 12346 (線程 B ID)
   │  ├─ user = "Bob"
   │  └─ count = 3
   │
   └─ 12347 (線程 C ID)
      ├─ user = "Charlie"
      └─ count = 3

2.2 Python 官方實現

Python 的 threading.local 實現更複雜，處理了：

弱引用：線程結束時自動清理資料
繼承支持：可以繼承 threading.local
描述符協議：支持屬性訪問控制

官方實現關鍵代碼（簡化）：

# Lib/threading.py (簡化版本)

class local:
    __slots__ = '_local__impl', '_local__lock', '__dict__'

    def __new__(cls, *args, **kwargs):
        # 每個 local 對象都有自己的實現
        self = object.__new__(cls)
        impl = _localimpl()
        impl.localargs = (args, kwargs)
        impl.locallock = RLock()
        object.__setattr__(self, '_local__impl', impl)
        object.__setattr__(self, '_local__lock', impl.locallock)

        # 註冊當前線程
        impl.create_dict()
        return self

    def __getattribute__(self, name):
        # 獲取當前線程的字典
        impl = object.__getattribute__(self, '_local__impl')
        dct = impl.get_dict()
        return dct[name]

    def __setattr__(self, name, value):
        # 設置到當前線程的字典
        impl = object.__getattribute__(self, '_local__impl')
        dct = impl.get_dict()
        dct[name] = value


class _localimpl:
    def get_dict(self):
        # 獲取當前線程的字典
        thread_id = get_ident()
        return self.dicts.get(thread_id, {})

    def create_dict(self):
        # 為當前線程創建字典
        thread_id = get_ident()
        d = {}
        self.dicts[thread_id] = d
        return d

關鍵特性：

弱引用清理：線程結束後，對應的資料會被垃圾回收
延遲初始化：只在線程首次訪問時創建字典
線程安全：使用鎖保護內部狀態

3. Django 中的 threading.local 應用

3.1 資料庫連接管理

Django 使用 threading.local 管理資料庫連接，確保每個線程有自己的連接。

實現位置：django/db/backends/base/base.py

# django/db/backends/base/base.py (簡化)

class BaseDatabaseWrapper:
    def __init__(self, settings_dict, alias=DEFAULT_DB_ALIAS):
        # 使用 threading.local 存儲連接
        self._thread_local = threading.local()
        self.settings_dict = settings_dict
        self.alias = alias

    @property
    def connection(self):
        """獲取當前線程的資料庫連接"""
        # 檢查當前線程是否已有連接
        if not hasattr(self._thread_local, 'connection'):
            # 如果沒有，創建新連接
            self._thread_local.connection = self.get_new_connection(
                self.get_connection_params()
            )
        return self._thread_local.connection

    def close(self):
        """關閉當前線程的連接"""
        if hasattr(self._thread_local, 'connection'):
            self._thread_local.connection.close()
            del self._thread_local.connection


# 使用示例
from django.db import connection

def view_function(request):
    # 線程 A 執行查詢
    users = User.objects.all()  # 使用線程 A 的連接

    # 線程 B 同時執行查詢
    # 會自動使用線程 B 的連接，不會衝突！

連接池可視化：

DatabaseWrapper 對象
│
└─ _thread_local
   ├─ 線程 A (ID: 12345)
   │  └─ connection → MySQL Connection #1
   │
   ├─ 線程 B (ID: 12346)
   │  └─ connection → MySQL Connection #2
   │
   └─ 線程 C (ID: 12347)
      └─ connection → MySQL Connection #3

3.2 事務管理

Django 的事務狀態也使用 threading.local 來隔離。

實現位置：django/db/transaction.py

# django/db/transaction.py (簡化)

class Atomic:
    def __init__(self, using=None, savepoint=True):
        self.using = using
        self.savepoint = savepoint

    def __enter__(self):
        connection = get_connection(self.using)

        # 檢查當前線程的事務狀態
        if not connection.in_atomic_block:
            # 開始新事務
            connection.set_autocommit(False)

        # 保存當前線程的事務狀態
        connection.in_atomic_block = True
        connection.savepoint_ids.append(
            connection.savepoint() if self.savepoint else None
        )

    def __exit__(self, exc_type, exc_value, traceback):
        connection = get_connection(self.using)

        if exc_type is None:
            # 提交當前線程的事務
            connection.commit()
        else:
            # 回滾當前線程的事務
            connection.rollback()


# 使用示例
from django.db import transaction

def transfer_money(from_user, to_user, amount):
    with transaction.atomic():
        # 這個事務只影響當前線程
        from_user.balance -= amount
        from_user.save()

        to_user.balance += amount
        to_user.save()

        # 其他線程的事務不受影響

3.3 請求上下文（中間件）

雖然 Django 不像 Flask 那樣使用全局的 request 對象，但在某些場景下也會用 threading.local 存儲請求上下文。

自定義中間件示例：

import threading

# 全局的 thread local 對象
_thread_locals = threading.local()

class ThreadLocalMiddleware:
    """在 threading.local 中存儲請求對象"""

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        # 將請求存儲到當前線程
        _thread_locals.request = request
        _thread_locals.user = request.user

        response = self.get_response(request)

        # 清理（可選）
        if hasattr(_thread_locals, 'request'):
            del _thread_locals.request
        if hasattr(_thread_locals, 'user'):
            del _thread_locals.user

        return response


def get_current_user():
    """在任何地方獲取當前用戶"""
    return getattr(_thread_locals, 'user', None)


# 在視圖、模型、工具函數中都可以使用
def some_utility_function():
    user = get_current_user()
    if user and user.is_authenticated:
        log.info(f"用戶 {user.username} 執行了操作")

配置中間件：

# settings.py
MIDDLEWARE = [
    'myapp.middleware.ThreadLocalMiddleware',  # 添加自定義中間件
    'django.middleware.security.SecurityMiddleware',
    # ...
]

4. threading.local 最佳實踐

4.1 適合使用的場景

✅ 推薦使用：

連接管理：資料庫連接、HTTP 連接池
請求上下文：存儲當前請求的元資料
事務狀態：追蹤事務的開始、提交、回滾
認證資訊：當前用戶、權限資訊
日誌上下文：請求 ID、追蹤 ID

示例：日誌上下文

import threading
import logging

# 創建 thread local 對象
thread_local = threading.local()

class RequestIdFilter(logging.Filter):
    """在日誌中添加請求 ID"""

    def filter(self, record):
        record.request_id = getattr(thread_local, 'request_id', 'N/A')
        return True


# 配置日誌
logging.basicConfig(
    format='%(asctime)s [%(request_id)s] %(message)s',
    level=logging.INFO
)
logger = logging.getLogger()
logger.addFilter(RequestIdFilter())


# 在中間件中設置請求 ID
class RequestIdMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        import uuid
        thread_local.request_id = str(uuid.uuid4())[:8]

        response = self.get_response(request)
        return response


# 在任何地方記錄日誌都會帶上請求 ID
def my_view(request):
    logger.info("處理用戶請求")  # 輸出: 2025-01-14 11:40:00 [a3b4c5d6] 處理用戶請求
    return HttpResponse("OK")

4.2 注意事項

❌ 不適合使用：

大量資料存儲：會增加記憶體使用
跨線程通信：threading.local 無法在線程間共享
異步環境：asyncio 中應使用 contextvars

陷阱 1：線程池復用

import threading
from concurrent.futures import ThreadPoolExecutor

thread_local = threading.local()

def worker(task_id):
    # ⚠️ 線程池中的線程會被復用！
    if not hasattr(thread_local, 'counter'):
        thread_local.counter = 0

    thread_local.counter += 1
    print(f"任務 {task_id}: counter = {thread_local.counter}")


# 使用線程池（線程會被復用）
with ThreadPoolExecutor(max_workers=2) as executor:
    for i in range(6):
        executor.submit(worker, i)

# 可能的輸出：
# 任務 0: counter = 1
# 任務 1: counter = 1
# 任務 2: counter = 2  # 線程被復用，counter 累加！
# 任務 3: counter = 2
# 任務 4: counter = 3  # 繼續累加
# 任務 5: counter = 3

解決方案：每次任務開始時重置

def worker(task_id):
    # ✅ 顯式重置
    thread_local.counter = 0

    thread_local.counter += 1
    print(f"任務 {task_id}: counter = {thread_local.counter}")

陷阱 2：記憶體泄漏

import threading

thread_local = threading.local()

def create_many_threads():
    threads = []
    for i in range(1000):
        t = threading.Thread(target=lambda: setattr(thread_local, 'data', [0] * 1000000))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

    # ⚠️ 線程結束後，thread_local 的資料應該被清理
    # 但如果線程對象還被引用，資料不會被釋放！


# ✅ 正確做法：不持有線程對象的引用
def create_many_threads_correct():
    for i in range(1000):
        t = threading.Thread(target=lambda: setattr(thread_local, 'data', [0] * 1000000))
        t.start()
        t.join()  # 等待線程結束，然後立即釋放

陷阱 3：異步環境下失效

import asyncio
import threading

thread_local = threading.local()

async def async_worker(name):
    # ❌ 在 asyncio 中，多個協程可能在同一個線程中執行
    thread_local.user = name
    await asyncio.sleep(0.1)  # 切換協程

    print(f"{name} 看到的用戶: {thread_local.user}")
    # 可能被其他協程修改！


# 運行
asyncio.run(asyncio.gather(
    async_worker("Alice"),
    async_worker("Bob"),
))

# 可能的輸出：
# Alice 看到的用戶: Bob  # 錯誤！
# Bob 看到的用戶: Bob


# ✅ 正確做法：使用 contextvars
from contextvars import ContextVar

user_var = ContextVar('user', default=None)

async def async_worker_correct(name):
    user_var.set(name)
    await asyncio.sleep(0.1)

    print(f"{name} 看到的用戶: {user_var.get()}")
    # 正確！每個協程有自己的上下文


asyncio.run(asyncio.gather(
    async_worker_correct("Alice"),
    async_worker_correct("Bob"),
))

# 輸出：
# Alice 看到的用戶: Alice
# Bob 看到的用戶: Bob

5. 替代方案對比

方案	適用場景	優點	缺點
threading.local	多線程環境	簡單、自動管理	不支持協程
contextvars.ContextVar	異步環境（asyncio）	支持協程、任務	Python 3.7+
全局字典 + 線程 ID	需要細粒度控制	靈活	需要手動管理、容易泄漏
函數參數傳遞	簡單場景	最安全、顯式	冗長、不適合深層調用

選擇指南：

# 1. 如果使用傳統多線程（WSGI、threading）
import threading
thread_local = threading.local()

# 2. 如果使用 asyncio、ASGI（FastAPI、Starlette）
from contextvars import ContextVar
context_var = ContextVar('name')

# 3. 如果調用鏈很短（< 3 層）
def view(request):
    process_data(request.user)  # 直接傳遞

# 4. 如果需要跨線程共享
from threading import Lock
shared_data = {}
shared_lock = Lock()

6. 實戰案例：實現請求追蹤系統

6.1 需求

實現一個請求追蹤系統，自動為每個請求生成唯一 ID，並在所有日誌中顯示。

6.2 實現

# middleware/request_tracking.py

import threading
import uuid
import logging
import time

# 創建 thread local 對象
_request_context = threading.local()


class RequestTrackingMiddleware:
    """請求追蹤中間件"""

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        # 生成請求 ID
        request_id = str(uuid.uuid4())
        request.id = request_id

        # 存儲到 thread local
        _request_context.request_id = request_id
        _request_context.user_id = request.user.id if request.user.is_authenticated else None
        _request_context.path = request.path
        _request_context.method = request.method
        _request_context.start_time = time.time()

        # 處理請求
        response = self.get_response(request)

        # 計算耗時
        duration = time.time() - _request_context.start_time

        # 記錄請求日誌
        logger.info(
            f"{request.method} {request.path} "
            f"[{response.status_code}] "
            f"{duration*1000:.2f}ms"
        )

        # 清理上下文
        self._clear_context()

        return response

    def _clear_context(self):
        """清理線程本地資料"""
        for attr in ['request_id', 'user_id', 'path', 'method', 'start_time']:
            if hasattr(_request_context, attr):
                delattr(_request_context, attr)


# 自定義日誌過濾器
class RequestContextFilter(logging.Filter):
    """在日誌中添加請求上下文"""

    def filter(self, record):
        # 從 thread local 獲取資訊
        record.request_id = getattr(_request_context, 'request_id', '-')
        record.user_id = getattr(_request_context, 'user_id', '-')
        return True


# 配置日誌
logger = logging.getLogger('django')
logger.addFilter(RequestContextFilter())


# 工具函數：獲取當前請求 ID
def get_current_request_id():
    """在任何地方獲取當前請求 ID"""
    return getattr(_request_context, 'request_id', None)


def get_current_user_id():
    """在任何地方獲取當前用戶 ID"""
    return getattr(_request_context, 'user_id', None)


# 使用示例
def some_service_function():
    """在業務邏輯中使用請求上下文"""
    request_id = get_current_request_id()
    user_id = get_current_user_id()

    logger.info(f"執行業務邏輯")
    # 日誌輸出: [a3b4c5d6] [user:123] 執行業務邏輯

    # 調用外部 API 時傳遞 request_id
    response = requests.post(
        'https://api.example.com/endpoint',
        headers={'X-Request-ID': request_id}
    )

配置：

# settings.py

MIDDLEWARE = [
    'myapp.middleware.request_tracking.RequestTrackingMiddleware',  # 第一個
    'django.middleware.security.SecurityMiddleware',
    # ...
]

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'detailed': {
            'format': '%(asctime)s [%(request_id)s] [user:%(user_id)s] %(levelname)s %(message)s'
        },
    },
    'filters': {
        'request_context': {
            '()': 'myapp.middleware.request_tracking.RequestContextFilter',
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'detailed',
            'filters': ['request_context'],
        },
    },
    'root': {
        'handlers': ['console'],
        'level': 'INFO',
    },
}

效果：

2025-01-14 11:40:00 [a3b4c5d6] [user:123] INFO 開始處理訂單
2025-01-14 11:40:01 [a3b4c5d6] [user:123] INFO 檢查庫存
2025-01-14 11:40:02 [a3b4c5d6] [user:123] INFO 創建訂單成功
2025-01-14 11:40:03 [a3b4c5d6] [user:123] INFO POST /api/orders/ [201] 3120.45ms

7. 面試常見問題

Q1: threading.local 和全局變量有什麼區別？

答案：

特性	全局變量	threading.local
共享範圍	所有線程共享	每個線程獨立
線程安全	需要加鎖	自動線程安全
資料隔離	❌ 無隔離	✅ 完全隔離
清理機制	需要手動清理	線程結束自動清理

# 全局變量 - 所有線程共享
count = 0

def worker():
    global count
    for _ in range(10000):
        count += 1  # 需要加鎖！


# threading.local - 每個線程獨立
thread_local = threading.local()

def worker():
    thread_local.count = 0
    for _ in range(10000):
        thread_local.count += 1  # 不需要加鎖

Q2: Django 為什麼要用 threading.local 管理資料庫連接？

答案：

避免連接競爭：多個線程共享一個連接會導致查詢混亂
自動管理：每個線程創建、使用、關閉自己的連接
性能優化：避免頻繁創建連接（連接復用）
事務隔離：每個線程的事務互不影響

# 如果不用 threading.local（錯誤示例）
global_connection = create_connection()

def view1(request):
    global_connection.execute("BEGIN")  # 線程 A 開始事務
    # ...

def view2(request):
    global_connection.execute("BEGIN")  # 線程 B 開始事務
    # ❌ 衝突！兩個事務共用一個連接


# 使用 threading.local（正確）
class DatabaseWrapper:
    def __init__(self):
        self._local = threading.local()

    @property
    def connection(self):
        if not hasattr(self._local, 'conn'):
            self._local.conn = create_connection()  # 每個線程獨立連接
        return self._local.conn

Q3: threading.local 在異步環境（asyncio）下能用嗎？

答案：

不能，因為：

threading.local 基於線程 ID 隔離資料
asyncio 中多個協程可能在同一個線程中執行
結果：多個協程共享同一份資料

示例：

import asyncio
import threading

thread_local = threading.local()

async def task(name):
    thread_local.name = name
    await asyncio.sleep(0.1)  # 切換協程
    print(f"{name} 看到的 name: {thread_local.name}")  # 可能被覆蓋


asyncio.run(asyncio.gather(task("A"), task("B")))
# 輸出: A 看到的 name: B  ❌ 錯誤！

正確方案：使用 contextvars

from contextvars import ContextVar

name_var = ContextVar('name')

async def task(name):
    name_var.set(name)
    await asyncio.sleep(0.1)
    print(f"{name} 看到的 name: {name_var.get()}")  # 正確


asyncio.run(asyncio.gather(task("A"), task("B")))
# 輸出:
# A 看到的 name: A  ✅
# B 看到的 name: B  ✅

Q4: 如何檢測 threading.local 導致的記憶體泄漏？

答案：

工具：

memory_profiler：分析記憶體使用
tracemalloc：Python 內建記憶體追蹤
objgraph：可視化對象引用

示例：

import threading
import tracemalloc

thread_local = threading.local()

def worker():
    # 存儲大量資料
    thread_local.data = [0] * 1000000  # 分配 1MB
    # 線程結束


# 開始追蹤
tracemalloc.start()

# 創建大量線程
threads = []
for i in range(100):
    t = threading.Thread(target=worker)
    t.start()
    threads.append(t)  # ❌ 持有線程引用！

for t in threads:
    t.join()

# 查看記憶體使用
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

for stat in top_stats[:10]:
    print(stat)

# 輸出：可能顯示 100MB 未釋放（100 個線程 × 1MB）


# ✅ 修復：不持有線程引用
for i in range(100):
    t = threading.Thread(target=worker)
    t.start()
    t.join()  # 立即等待並釋放

# 記憶體正常釋放

Q5: 能給 threading.local 對象設置初始值嗎？

答案：

不能直接設置，但可以通過繼承和重寫 __init__ 實現。

import threading

# ❌ 這樣不行
thread_local = threading.local()
thread_local.count = 0  # 只在當前線程有效！


# ✅ 正確方法：繼承 threading.local
class MyLocal(threading.local):
    def __init__(self):
        # 每個線程第一次訪問時都會調用
        self.count = 0
        self.name = "default"


my_local = MyLocal()

def worker(id):
    print(f"線程 {id}: count = {my_local.count}")  # 都是 0
    my_local.count += 1
    print(f"線程 {id}: count = {my_local.count}")  # 都是 1


import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    executor.map(worker, [1, 2, 3])

# 輸出：
# 線程 1: count = 0
# 線程 1: count = 1
# 線程 2: count = 0
# 線程 2: count = 1
# 線程 3: count = 0
# 線程 3: count = 1

8. 總結

8.1 核心要點

threading.local 本質：
- 使用線程 ID 作為 key 的字典
- 每個線程有獨立的資料副本
- 線程結束時自動清理
Django 的應用：
- 資料庫連接池（每個線程獨立連接）
- 事務管理（隔離不同線程的事務狀態）
- 請求上下文（存儲當前請求的元資料）
最佳實踐：
- ✅ 適合：連接管理、請求上下文、事務狀態
- ❌ 不適合：大量資料、跨線程通信、異步環境
- ⚠️ 注意：線程池復用、記憶體泄漏、異步環境失效
替代方案：
- contextvars.ContextVar：異步環境（asyncio）
- 函數參數傳遞：簡單場景
- 全局字典 + 鎖：需要跨線程共享

8.2 決策樹

需要線程隔離的資料？
├─ Yes → 使用什麼環境？
│  ├─ 多線程（WSGI） → threading.local ✅
│  ├─ 異步（ASGI、asyncio） → contextvars.ContextVar ✅
│  └─ 混合環境 → 兩者都用
└─ No → 需要跨線程共享？
   ├─ Yes → 使用全局變量 + 鎖
   └─ No → 直接用函數參數傳遞

參考資料

官方文檔：
深入閱讀：
工具：
- memory_profiler - 記憶體分析
- objgraph - 對象引用可視化
- tracemalloc - 內建記憶體追蹤

下一篇預告：第 12 章 - Django 部署策略