本節(jié)繼續(xù)介紹排序的實(shí)現(xiàn),主要內(nèi)容是tuplesort_puttupleslot->puttuple_common調(diào)用的inittapes和dumptuples函數(shù).
創(chuàng)新互聯(lián)建站服務(wù)項(xiàng)目包括霞山網(wǎng)站建設(shè)、霞山網(wǎng)站制作、霞山網(wǎng)頁(yè)制作以及霞山網(wǎng)絡(luò)營(yíng)銷(xiāo)策劃等。多年來(lái),我們專(zhuān)注于互聯(lián)網(wǎng)行業(yè),利用自身積累的技術(shù)優(yōu)勢(shì)、行業(yè)經(jīng)驗(yàn)、深度合作伙伴關(guān)系等,向廣大中小型企業(yè)、政府機(jī)構(gòu)等提供互聯(lián)網(wǎng)行業(yè)的解決方案,霞山網(wǎng)站推廣取得了明顯的社會(huì)效益與經(jīng)濟(jì)效益。目前,我們服務(wù)的客戶(hù)以成都為中心已經(jīng)輻射到霞山省份的部分城市,未來(lái)相信會(huì)繼續(xù)擴(kuò)大服務(wù)區(qū)域并繼續(xù)獲得客戶(hù)的支持與信任!
在內(nèi)存不能滿(mǎn)足排序需求時(shí),使用了Polyphase Merging排序
Tuplesortstate
Tuplesort操作的私有狀態(tài).
/*
* Possible states of a Tuplesort object. These denote the states that
* persist between calls of Tuplesort routines.
* Tuplesort對(duì)象可能的狀態(tài).
* 這些標(biāo)示在Tuplesort例程調(diào)用之間會(huì)持久存在的狀態(tài).
*/
typedef enum
{
//裝載元組,在內(nèi)存限制之內(nèi)
TSS_INITIAL, /* Loading tuples; still within memory limit */
//裝載元組到有界堆中
TSS_BOUNDED, /* Loading tuples into bounded-size heap */
//裝載元組,寫(xiě)入到tape中
TSS_BUILDRUNS, /* Loading tuples; writing to tape */
//完全在內(nèi)存中完成排序
TSS_SORTEDINMEM, /* Sort completed entirely in memory */
//完成排序,最后在tape上執(zhí)行排序
TSS_SORTEDONTAPE, /* Sort completed, final run is on tape */
//不落地執(zhí)行最后的歸并
TSS_FINALMERGE /* Performing final merge on-the-fly */
} TupSortStatus;
/*
* Parameters for calculation of number of tapes to use --- see inittapes()
* and tuplesort_merge_order().
* 用于計(jì)算需要使用多少個(gè)tapes的參數(shù).--- 詳細(xì)參見(jiàn)inittapes()和tuplesort_merge_order().
*
* In this calculation we assume that each tape will cost us about 1 blocks
* worth of buffer space. This ignores the overhead of all the other data
* structures needed for each tape, but it's probably close enough.
* 在這個(gè)計(jì)算中,我們假定每一個(gè)tape會(huì)大概消耗緩存空間的一個(gè)block.
* 雖然已經(jīng)忽略了所有每個(gè)tape依賴(lài)的其他數(shù)據(jù)結(jié)構(gòu),但已經(jīng)非常接近了.
*
* MERGE_BUFFER_SIZE is how much data we'd like to read from each input
* tape during a preread cycle (see discussion at top of file).
* MERGE_BUFFER_SIZE表示在每一輪讀周期中我們將要從每個(gè)輸入taple中讀取的數(shù)據(jù)大小
*/
#define MINORDER 6 /* minimum merge order */
#define MAXORDER 500 /* maximum merge order */
#define TAPE_BUFFER_OVERHEAD BLCKSZ
#define MERGE_BUFFER_SIZE (BLCKSZ * 32)
typedef int (*SortTupleComparator) (const SortTuple *a, const SortTuple *b,
Tuplesortstate *state);
/*
* Private state of a Tuplesort operation.
* Tuplesort操作的私有狀態(tài).
*/
struct Tuplesortstate
{
//狀態(tài) : 枚舉值詳見(jiàn)上面的信息
TupSortStatus status; /* enumerated value as shown above */
//sort key中的列數(shù)
int nKeys; /* number of columns in sort key */
//調(diào)用者需要隨機(jī)訪(fǎng)問(wèn)?
bool randomAccess; /* did caller request random access? */
//調(diào)用者是否指定了最大返回的元組的數(shù)目?
bool bounded; /* did caller specify a maximum number of
* tuples to return? */
//使用有界堆,則返回T
bool boundUsed; /* true if we made use of a bounded heap */
//如為有界堆,這里存儲(chǔ)最大的元組個(gè)數(shù)
int bound; /* if bounded, the maximum number of tuples */
//SortTuple.tuple是否可以設(shè)置?
bool tuples; /* Can SortTuple.tuple ever be set? */
//剩余可用內(nèi)存大小(單位:字節(jié))
int64 availMem; /* remaining memory available, in bytes */
//允許的內(nèi)存總大小(單位:字節(jié))
int64 allowedMem; /* total memory allowed, in bytes */
//tapes個(gè)數(shù)
int maxTapes; /* number of tapes (Knuth's T) */
//tapes個(gè)數(shù) - 1
int tapeRange; /* maxTapes-1 (Knuth's P) */
//主要用于排序數(shù)據(jù)的內(nèi)存上下文
MemoryContext sortcontext; /* memory context holding most sort data */
//用于元組數(shù)據(jù)的sortcontext的子上下文
MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */
//臨時(shí)文件中tapes的logtape.c對(duì)象
LogicalTapeSet *tapeset; /* logtape.c object for tapes in a temp file */
/*
* These function pointers decouple the routines that must know what kind
* of tuple we are sorting from the routines that don't need to know it.
* They are set up by the tuplesort_begin_xxx routines.
* 這些函數(shù)指針將必須知道排序的哪種元組的例程與不需要知道它的例程解耦.
*
* Function to compare two tuples; result is per qsort() convention, ie:
* <0, 0, >0 according as a<b, a=b, a>b. The API must match
* qsort_arg_comparator.
* 比較兩個(gè)元組的函數(shù),結(jié)果由每個(gè)qsort()約定,比如:
* < 0, 0, >0代表a<b,a=b,a>b.API必須匹配qsort_arg_comparator.
*/
SortTupleComparator comparetup;
/*
* Function to copy a supplied input tuple into palloc'd space and set up
* its SortTuple representation (ie, set tuple/datum1/isnull1). Also,
* state->availMem must be decreased by the amount of space used for the
* tuple copy (note the SortTuple struct itself is not counted).
* 該函數(shù)用于拷貝一個(gè)輸入的元組到由palloc分配的內(nèi)存空間中,
* 同時(shí)設(shè)置SortTuple數(shù)據(jù)結(jié)構(gòu)(比如設(shè)置tuple/datum1/isnull1等).
* 同時(shí),state->availMem必須減去用于元組拷貝的空間大小(注意:SortTuple結(jié)構(gòu)體不計(jì)算在內(nèi)).
*/
void (*copytup) (Tuplesortstate *state, SortTuple *stup, void *tup);
/*
* Function to write a stored tuple onto tape. The representation of the
* tuple on tape need not be the same as it is in memory; requirements on
* the tape representation are given below. Unless the slab allocator is
* used, after writing the tuple, pfree() the out-of-line data (not the
* SortTuple struct!), and increase state->availMem by the amount of
* memory space thereby released.
* 用于寫(xiě)入元組到taple的函數(shù).
* tape中的元組聲明不需要與內(nèi)存中的一致,tape中的聲明要求詳見(jiàn)下面說(shuō)明.
* 除非使用slab分配器,在寫(xiě)入元組后,pfree() out-of-line的數(shù)據(jù)(不是SortTuple結(jié)構(gòu)體),
* 同時(shí)把剛才釋放的內(nèi)存空間加到state->availMem中.
*/
void (*writetup) (Tuplesortstate *state, int tapenum,
SortTuple *stup);
/*
* Function to read a stored tuple from tape back into memory. 'len' is
* the already-read length of the stored tuple. The tuple is allocated
* from the slab memory arena, or is palloc'd, see readtup_alloc().
* 從tape中讀取元組到內(nèi)存中的函數(shù).
* 'len'是已讀取的存儲(chǔ)元組的長(zhǎng)度.元組在slab內(nèi)存空間/palloc中分配,詳細(xì)參考readtup_alloc()函數(shù)
*/
void (*readtup) (Tuplesortstate *state, SortTuple *stup,
int tapenum, unsigned int len);
/*
* This array holds the tuples now in sort memory. If we are in state
* INITIAL, the tuples are in no particular order; if we are in state
* SORTEDINMEM, the tuples are in final sorted order; in states BUILDRUNS
* and FINALMERGE, the tuples are organized in "heap" order per Algorithm
* H. In state SORTEDONTAPE, the array is not used.
* 該數(shù)組保存排序內(nèi)存中的元組.當(dāng)前狀態(tài)為
* INITIAL:元組沒(méi)有特定的順序;
* SORTEDINMEM:元組處于最終已排序的狀態(tài);
* BUILDRUNS/FINALMERGE:元組按算法H的'堆'順序組織.
* SORTEDONTAPE:數(shù)組未使用.
*/
//SortTuple結(jié)構(gòu)體數(shù)組
SortTuple *memtuples; /* array of SortTuple structs */
//當(dāng)前存在的元組數(shù)
int memtupcount; /* number of tuples currently present */
//memtuples數(shù)組的已分配的大小
int memtupsize; /* allocated length of memtuples array */
//memtuples的增長(zhǎng)仍在進(jìn)行中?
bool growmemtuples; /* memtuples' growth still underway? */
/*
* Memory for tuples is sometimes allocated using a simple slab allocator,
* rather than with palloc(). Currently, we switch to slab allocation
* when we start merging. Merging only needs to keep a small, fixed
* number of tuples in memory at any time, so we can avoid the
* palloc/pfree overhead by recycling a fixed number of fixed-size slots
* to hold the tuples.
* 有時(shí)候元組的內(nèi)存分配使用簡(jiǎn)單的slab分配器實(shí)現(xiàn)而不是palloc().
* 在開(kāi)始?xì)w并時(shí),同步切換至slab分配器.歸并只需要在內(nèi)存中保持簡(jiǎn)單/固定的元組數(shù)目,
* 因此可以避免頻繁回收固定數(shù)目固定大小的slots(用于保存元組)而導(dǎo)致的palloc/pfree過(guò)載.
*
* For the slab, we use one large allocation, divided into SLAB_SLOT_SIZE
* slots. The allocation is sized to have one slot per tape, plus one
* additional slot. We need that many slots to hold all the tuples kept
* in the heap during merge, plus the one we have last returned from the
* sort, with tuplesort_gettuple.
* 對(duì)于slab,使用大型分配器,拆分為SLAB_SLOT_SIZE個(gè)大小的slots.
* 分配器的大小為每個(gè)tape一個(gè)slot,外加一個(gè)為的slot.
* 我們需要這么多slots是因?yàn)樵跉w并期間需要保存所有在堆中的元組,外加tuplesort_gettuple從排序中最終返回的元組
*
* Initially, all the slots are kept in a linked list of free slots. When
* a tuple is read from a tape, it is put to the next available slot, if
* it fits. If the tuple is larger than SLAB_SLOT_SIZE, it is palloc'd
* instead.
* 一開(kāi)始,所有的slots在空閑slots中以鏈表的方式保存.
* 從tape中讀取元組時(shí),如合適的話(huà),元組會(huì)放到下一個(gè)可用slot中.
* 如果元組比SLAB_SLOT_SIZE要大,改用palloc分配內(nèi)存空間.
*
* When we're done processing a tuple, we return the slot back to the free
* list, or pfree() if it was palloc'd. We know that a tuple was
* allocated from the slab, if its pointer value is between
* slabMemoryBegin and -End.
* 如果已完成元組的處理,返回slot到空閑鏈表中,如果使用palloc分配則使用pfree回收空間.
* 如果元組指針值在slabMemoryBegin和slabMemoryEnd之間,那么我們可以知道元組是從slab中分配的.
*
* When the slab allocator is used, the USEMEM/LACKMEM mechanism of
* tracking memory usage is not used.
* 如使用了slab分配器,不會(huì)使用USEMEM/LACKMEM機(jī)制跟蹤內(nèi)存使用.
*/
bool slabAllocatorUsed;
//slab內(nèi)存空間的起始位置
char *slabMemoryBegin; /* beginning of slab memory arena */
//slab內(nèi)存空間的結(jié)束位置
char *slabMemoryEnd; /* end of slab memory arena */
//鏈表頭
SlabSlot *slabFreeHead; /* head of free list */
/* Buffer size to use for reading input tapes, during merge. */
//在歸并期間用于讀取輸入tapes的緩存大小
size_t read_buffer_size;
/*
* When we return a tuple to the caller in tuplesort_gettuple_XXX, that
* came from a tape (that is, in TSS_SORTEDONTAPE or TSS_FINALMERGE
* modes), we remember the tuple in 'lastReturnedTuple', so that we can
* recycle the memory on next gettuple call.
* 通過(guò)tuplesort_gettuple_XXX方法調(diào)用返回元組給調(diào)用者時(shí),元組從tape中獲取
* (也就是說(shuō),TSS_SORTEDONTAPE/TSS_FINALMERGE模式),這時(shí)候會(huì)把元組放在'lastReturnedTuple'中,
* 因此可在下次gettuple調(diào)用中回收內(nèi)存.
*/
void *lastReturnedTuple;
/*
* While building initial runs, this is the current output run number.
* Afterwards, it is the number of initial runs we made.
* 在構(gòu)建initial運(yùn)行時(shí),這是當(dāng)前輸出的run編號(hào).
* 后續(xù)這是我們構(gòu)建好的initial runs的編號(hào).
*/
int currentRun;
/*
* Unless otherwise noted, all pointer variables below are pointers to
* arrays of length maxTapes, holding per-tape data.
* 除非特別注意,下面所有的指針變量是指向長(zhǎng)度為maxTapes的數(shù)組,保存per-tape數(shù)據(jù).
*/
/*
* This variable is only used during merge passes. mergeactive[i] is true
* if we are reading an input run from (actual) tape number i and have not
* yet exhausted that run.
* 該變量在歸并過(guò)程中使用.
* mergeactive[i]為T(mén)如果我們從編號(hào)為i的tape中讀取數(shù)據(jù),仍未在該run中消耗完畢.
*/
//活躍的輸入run源?
bool *mergeactive; /* active input run source? */
/*
* Variables for Algorithm D. Note that destTape is a "logical" tape
* number, ie, an index into the tp_xxx[] arrays. Be careful to keep
* "logical" and "actual" tape numbers straight!
* 用于算法D的變量.
* 注意destTape是一個(gè)邏輯tape編號(hào),例如,是指向tp_xxx[]數(shù)組的索引.
* 注意保持"邏輯"和"實(shí)際"tape編號(hào)的連續(xù)性.
*/
//Knuth's l
int Level; /* Knuth's l */
//當(dāng)前輸出tape(Knuth's j)
int destTape; /* current output tape (Knuth's j, less 1) */
//目標(biāo)斐波拉契run計(jì)數(shù)(A[])
int *tp_fib; /* Target Fibonacci run counts (A[]) */
//每一個(gè)tape上真正runs的編號(hào)
int *tp_runs; /* # of real runs on each tape */
//每一個(gè)tape(D[])上虛擬runs的編號(hào)
int *tp_dummy; /* # of dummy runs for each tape (D[]) */
//實(shí)際的tape編號(hào)(TAPE[])
int *tp_tapenum; /* Actual tape numbers (TAPE[]) */
//歸并輪中的活動(dòng)輸入tapes編號(hào)
int activeTapes; /* # of active input tapes in merge pass */
/*
* These variables are used after completion of sorting to keep track of
* the next tuple to return. (In the tape case, the tape's current read
* position is also critical state.)
* 這些變量用于在排序完成后保持下一個(gè)返回元組時(shí)的跟蹤.
* (在tape情況下,tape的當(dāng)前讀取位置也是重要的狀態(tài))
*/
//已完成輸出的實(shí)際tape編號(hào)
int result_tape; /* actual tape number of finished output */
//數(shù)組編號(hào)(僅用于SORTEDINMEM)
int current; /* array index (only used if SORTEDINMEM) */
//是否到達(dá)EOF(用于游標(biāo))
bool eof_reached; /* reached EOF (needed for cursors) */
/* markpos_xxx holds marked position for mark and restore */
//markpos_xxx保持已標(biāo)記的位置,用于標(biāo)記和存儲(chǔ)
//tape block編號(hào)(只用于SORTEDONTAPE)
long markpos_block; /* tape block# (only used if SORTEDONTAPE) */
//存儲(chǔ)的"current",或者tape塊中的偏移
int markpos_offset; /* saved "current", or offset in tape block */
//存儲(chǔ)的eof_reached
bool markpos_eof; /* saved "eof_reached" */
/*
* These variables are used during parallel sorting.
* 這些變量用于并行排序.
*
* worker is our worker identifier. Follows the general convention that
* -1 value relates to a leader tuplesort, and values >= 0 worker
* tuplesorts. (-1 can also be a serial tuplesort.)
* worker是worker標(biāo)識(shí)符ID.
* 遵循一般約定,-1值與leader tuplesort相關(guān),并且值>= 0表示worker tuplesorts。
* (在串行tuplesort時(shí),-1也可以表示這種情況)
*
* shared is mutable shared memory state, which is used to coordinate
* parallel sorts.
* shared是可變的共享內(nèi)存狀態(tài),用于協(xié)調(diào)并行排序.
*
* nParticipants is the number of worker Tuplesortstates known by the
* leader to have actually been launched, which implies that they must
* finish a run leader can merge. Typically includes a worker state held
* by the leader process itself. Set in the leader Tuplesortstate only.
* nParticipants是已知的worker Tuplesortstates的數(shù)目,這些worker由leader感知,是實(shí)際啟動(dòng)的worker數(shù),
* 這也意味著在leader可以歸并前這些worker必須完成.
* 典型的,leader進(jìn)行自身包含至少一個(gè)worker狀態(tài).
* 只在leader的Tuplesortstate中設(shè)置.
*/
int worker;
Sharedsort *shared;
int nParticipants;
/*
* The sortKeys variable is used by every case other than the hash index
* case; it is set by tuplesort_begin_xxx. tupDesc is only used by the
* MinimalTuple and CLUSTER routines, though.
* sortKeys變量用于every而不是hash index,通過(guò)tuplesort_begin_xxx設(shè)置.
* tupDesc只由MinimalTuple和CLUSTER例程使用.
*/
TupleDesc tupDesc;
//長(zhǎng)度nKeys數(shù)組
SortSupport sortKeys; /* array of length nKeys */
/*
* This variable is shared by the single-key MinimalTuple case and the
* Datum case (which both use qsort_ssup()). Otherwise it's NULL.
* 該變量在單鍵MinimalTuple和Datum情況下(使用qsort_ssup()函數(shù))共享使用,否則的話(huà)值為NULL.
*/
SortSupport onlyKey;
/*
* Additional state for managing "abbreviated key" sortsupport routines
* (which currently may be used by all cases except the hash index case).
* Tracks the intervals at which the optimization's effectiveness is
* tested.
* 管理"縮寫(xiě)鍵"sortsupport過(guò)程的額外狀態(tài).
* (除了hash index外會(huì)被其他情況使用)
* 跟蹤在優(yōu)化器有效性測(cè)試時(shí)時(shí)間間隔.
*/
int64 abbrevNext; /* Tuple # at which to next check
* applicability */
/*
* These variables are specific to the CLUSTER case; they are set by
* tuplesort_begin_cluster.
* 這些變量?jī)H在CLUSTER時(shí)生效,通過(guò)tuplesort_begin_cluster設(shè)置.
*/
//將用于依賴(lài)的索引信息
IndexInfo *indexInfo; /* info about index being used for reference */
//解析索引表達(dá)式的運(yùn)行期狀態(tài)
EState *estate; /* for evaluating index expressions */
/*
* These variables are specific to the IndexTuple case; they are set by
* tuplesort_begin_index_xxx and used only by the IndexTuple routines.
* 這些變量?jī)H用于IndexTuple.
* 通過(guò)tuplesort_begin_index_xxx設(shè)置,僅用于IndexTuple例程.
*/
//數(shù)據(jù)表
Relation heapRel; /* table the index is being built on */
//正在創(chuàng)建的index
Relation indexRel; /* index being built */
/* These are specific to the index_btree subcase: */
//這些僅在index_btree下使用
//如發(fā)現(xiàn)重復(fù)元組,則提示
bool enforceUnique; /* complain if we find duplicate tuples */
/* These are specific to the index_hash subcase: */
//index_hash情況
uint32 high_mask; /* masks for sortable part of hash code */
uint32 low_mask;
uint32 max_buckets;
/*
* These variables are specific to the Datum case; they are set by
* tuplesort_begin_datum and used only by the DatumTuple routines.
* 這些變量用于Datum,通過(guò)tuplesort_begin_datum設(shè)置,僅用于DatumTuple例程.
*/
Oid datumType;
/* we need typelen in order to know how to copy the Datums. */
//需要typelen用于知道如何拷貝Datums.
int datumTypeLen;
/*
* Resource snapshot for time of sort start.
* 在排序開(kāi)始時(shí)的資源快照
*/
#ifdef TRACE_SORT
PGRUsage ru_start;
#endif
};
inittapes
初始化tape sorting(Polyphase Merging).
/*
* inittapes - initialize for tape sorting.
* inittapes - 初始化tape sorting(Polyphase Merging).
*
* This is called only if we have found we won't sort in memory.
* 在內(nèi)存不足以滿(mǎn)足排序需求時(shí)才調(diào)用此函數(shù).
*/
static void
inittapes(Tuplesortstate *state, bool mergeruns)
{
int maxTapes,//最大tapes
j;
Assert(!LEADER(state));
if (mergeruns)
{
/* Compute number of tapes to use: merge order plus 1 */
//計(jì)算tapes數(shù) : 歸并順序 + 1
/*
#define MINORDER 6
#define MAXORDER 500
#define TAPE_BUFFER_OVERHEAD BLCKSZ
#define MERGE_BUFFER_SIZE (BLCKSZ * 32)
mOrder = (allowedMem - TAPE_BUFFER_OVERHEAD) /
(MERGE_BUFFER_SIZE + TAPE_BUFFER_OVERHEAD);
mOrder = Max(mOrder, MINORDER);
mOrder = Min(mOrder, MAXORDER);
*/
maxTapes = tuplesort_merge_order(state->allowedMem) + 1;
}
else
{
/* Workers can sometimes produce single run, output without merge */
//Worker進(jìn)程有時(shí)可能產(chǎn)生單個(gè)run,不需要?dú)w并直接輸出.
Assert(WORKER(state));
maxTapes = MINORDER + 1;
}
#ifdef TRACE_SORT
if (trace_sort)
elog(LOG, "worker %d switching to external sort with %d tapes: %s",
state->worker, maxTapes, pg_rusage_show(&state->ru_start));
#endif
/* Create the tape set and allocate the per-tape data arrays */
//創(chuàng)建tape集合并分配per-tape數(shù)據(jù)數(shù)組
inittapestate(state, maxTapes);
state->tapeset =
LogicalTapeSetCreate(maxTapes, NULL,
state->shared ? &state->shared->fileset : NULL,
state->worker);
state->currentRun = 0;
/*
* Initialize variables of Algorithm D (step D1).
* 初始化算法D的變量(step D1)
*/
for (j = 0; j < maxTapes; j++)
{
state->tp_fib[j] = 1;
state->tp_runs[j] = 0;
state->tp_dummy[j] = 1;
state->tp_tapenum[j] = j;
}
state->tp_fib[state->tapeRange] = 0;
state->tp_dummy[state->tapeRange] = 0;
state->Level = 1;
state->destTape = 0;
//變更狀態(tài)為T(mén)SS_BUILDRUNS
state->status = TSS_BUILDRUNS;
}
dumptuples
清除memtuples中的元組并寫(xiě)入初始run到tape中
/*
* dumptuples - remove tuples from memtuples and write initial run to tape
* 清除memtuples中的元組并寫(xiě)入初始run到tape中
*
* When alltuples = true, dump everything currently in memory. (This case is
* only used at end of input data.)
* 如alltuples為T(mén),dump內(nèi)存中的所有數(shù)據(jù).
* (僅適用于與輸入數(shù)據(jù)結(jié)束時(shí))
*/
static void
dumptuples(Tuplesortstate *state, bool alltuples)
{
int memtupwrite;
int i;
/*
* Nothing to do if we still fit in available memory and have array slots,
* unless this is the final call during initial run generation.
* 如果可以放入可用內(nèi)存中并且仍有數(shù)據(jù)slots,并且當(dāng)前不是在初始run產(chǎn)生時(shí)的最后一次調(diào)用,則返回
*/
if (state->memtupcount < state->memtupsize && !LACKMEM(state) &&
!alltuples)
return;
/*
* Final call might require no sorting, in rare cases where we just so
* happen to have previously LACKMEM()'d at the point where exactly all
* remaining tuples are loaded into memory, just before input was
* exhausted.
* 最后一次調(diào)用可能不需要排序,在極少數(shù)情況下,
* 我們恰好在輸入耗盡之前調(diào)用了LACKMEM()'d,此時(shí)所有剩余的元組都被加載到內(nèi)存中.
*
* In general, short final runs are quite possible. Rather than allowing
* a special case where there was a superfluous selectnewtape() call (i.e.
* a call with no subsequent run actually written to destTape), we prefer
* to write out a 0 tuple run.
* 通常來(lái)說(shuō),最后的runs很有可能較短.
* 與其允許存在一個(gè)額外的selectnewtape()函數(shù)調(diào)用(即沒(méi)有后續(xù)運(yùn)行實(shí)際寫(xiě)入到destTape的調(diào)用),
* 還不如編寫(xiě)一個(gè)0元組的run.
*
* mergereadnext() is prepared for 0 tuple runs, and will reliably mark
* the tape inactive for the merge when called from beginmerge(). This
* case is therefore similar to the case where mergeonerun() finds a dummy
* run for the tape, and so doesn't need to merge a run from the tape (or
* conceptually "merges" the dummy run, if you prefer). According to
* Knuth, Algorithm D "isn't strictly optimal" in its method of
* distribution and dummy run assignment; this edge case seems very
* unlikely to make that appreciably worse.
* mergereadnext()為0元組runs作準(zhǔn)備,在beginmerge()函數(shù)調(diào)用該函數(shù)時(shí)將標(biāo)記tape為非活動(dòng)狀態(tài).
* 這種情況與mergeonerun()為tape檢索到虛擬run類(lèi)似,因此不需要?dú)w并(如果你愿意,可以執(zhí)行名義上的歸并).
* 按照Knuth的說(shuō)法,算法D不是嚴(yán)格分布和虛擬run分配優(yōu)化的,但這種極端的情況不太可能讓情況很糟糕.
*/
Assert(state->status == TSS_BUILDRUNS);
/*
* It seems unlikely that this limit will ever be exceeded, but take no
* chances
* 越界了.
*/
if (state->currentRun == INT_MAX)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("cannot have more than %d runs for an external sort",
INT_MAX)));
state->currentRun++;
#ifdef TRACE_SORT
if (trace_sort)
elog(LOG, "worker %d starting quicksort of run %d: %s",
state->worker, state->currentRun,
pg_rusage_show(&state->ru_start));
#endif
/*
* Sort all tuples accumulated within the allowed amount of memory for
* this run using quicksort
* 使用快速排序?qū)?nèi)存中的元組進(jìn)行排序.
*/
tuplesort_sort_memtuples(state);
#ifdef TRACE_SORT
if (trace_sort)
elog(LOG, "worker %d finished quicksort of run %d: %s",
state->worker, state->currentRun,
pg_rusage_show(&state->ru_start));
#endif
//寫(xiě)入到tape中
memtupwrite = state->memtupcount;
for (i = 0; i < memtupwrite; i++)
{
WRITETUP(state, state->tp_tapenum[state->destTape],
&state->memtuples[i]);
state->memtupcount--;
}
/*
* Reset tuple memory. We've freed all of the tuples that we previously
* allocated. It's important to avoid fragmentation when there is a stark
* change in the sizes of incoming tuples. Fragmentation due to
* AllocSetFree's bucketing by size class might be particularly bad if
* this step wasn't taken.
* 重置tuple內(nèi)存上下文.
* 目的是為了避免內(nèi)存碎片.
*/
MemoryContextReset(state->tuplecontext);
markrunend(state, state->tp_tapenum[state->destTape]);
state->tp_runs[state->destTape]++;
state->tp_dummy[state->destTape]--; /* per Alg D step D2 */
#ifdef TRACE_SORT
if (trace_sort)
elog(LOG, "worker %d finished writing run %d to tape %d: %s",
state->worker, state->currentRun, state->destTape,
pg_rusage_show(&state->ru_start));
#endif
//未完成所有元組的處理,分配新的tape
if (!alltuples)
selectnewtape(state);
}
N/A
Merge sort
Polyphase merge sort
Sorting Algorithms: Internal and External
網(wǎng)頁(yè)名稱(chēng):PostgreSQL源碼解讀(199)-查詢(xún)#114(排序#7-inittapes&dumptuples)
鏈接分享:http://bm7419.com/article46/jcsheg.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供動(dòng)態(tài)網(wǎng)站、電子商務(wù)、網(wǎng)站改版、企業(yè)網(wǎng)站制作、云服務(wù)器、App開(kāi)發(fā)
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶(hù)投稿、用戶(hù)轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話(huà):028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)