Category: 读书记录

  • vmware vsan 磁盘故障更换

    现象

    vsan 存储容量变小. 主机出现告警: vsan 数据出现错误

    (more…)

  • 重新认识CAP 定理.

    1. 特指 linearizability Consistency
    2. CAP根本没有提到延迟(latency),满足CAP可用性的系统可以花任意长的时间来回复一个请求.
    3. CAP系统的模型是一个只能读写单个数据的寄存器,事务(transaction)不在这个定理的范围之内
    4. 在设计分布式系统的时候,你需要考虑到更多得多的问题。如果太关注CAP就容易导致忽略了其他重要的问题

    (more…)

  • 通过linux top查看jvm的内存

    通过linux 的top命令查看进程的内存

    • top 中那些指标是关于内存的
    • 为什么 VIRT 有时候会比系统内存还大
    • RES 比 配置jvm 的Xmx还大
    • top中DATA代表什么

    top 命令内存相关参数

    CODEandDATA需要按F,然后使用空格键选中,才会显示出来

    top -p 1210
    
    PID USER      PR  NI    VIRT    RES    SHR   CODE    DATA   SWAP S %CPU %MEM     TIME+ COMMAND
    1210 mysql     20   0 1158504 130804   7736  22472 1052816      0 S  0.0  7.0   8:53.50 mariadbd
    

    VIRT

    36. VIRT  --  Virtual Memory Size (KiB)
        The total amount of virtual memory used by the task.  It includes all code, data and shared libraries plus pages that have been swapped out and pages that have  been mapped but not used.
    

    VIRT=CODE+DATA+shared libraries +pages that have been swapped out+pages that have been mapped but not used

    SWAP

    27. SWAP  --  Swapped Size (KiB)
        The non-resident portion of a task's address space.
    

    被 swap-out 的内存页大小

    RES

    17. RES  --  Resident Memory Size (KiB)
        The non-swapped physical memory a task is using.
    

    一个任务正在使用的,没有被swap-out 的物理内存

    例如下面的例子展示了RES 会包含SHR的匿名mmap

    #include <sys/mman.h>
    #include <unistd.h>
    #include <stdint.h>
    
    int main()
    {
        /* mmap 50MiB of shared anonymous memory */
        char *p = mmap(NULL, 50 << 20, PROT_READ | PROT_WRITE,
                       MAP_ANONYMOUS | MAP_SHARED, -1, 0);
    
        /* Touch every single page to make them resident */
        for (int i = 0; i < (50 << 20) / 4096; i++) {
            p[i * 4096] = 1;
        }
    
        /* Let us see the process in top */
        sleep(1000000);
    
        return 0;
    }
    
    gcc  -std=gnu99    main.c
    ./a
    ps -ef|pgrep a.out
    339065
    top -p 339065
    
    
        PID USER        VIRT    RES    SHR S  %CPU %MEM   CODE    DATA     TIME+ COMMAND
     338377 root       55412  51564  51476 S   0.0  0.1      4     180   0:00.01 a.out
    

    CODE

     4. CODE  --  Code Size (KiB)
        The amount of physical memory devoted to executable code, also known as the Text Resident Set size or TRS.
    
        可执行代码驻留的物理内存总量,驻存代码集合(Text Resident Set, TRS)
    

    DATA

    6. DATA  --  Data + Stack Size (KiB)
       The amount of physical memory devoted to other than executable code, also known as the Data Resident Set size or DRS.
    

    man手册里是不对的,可以看这篇文章里面的例子

    The DATA column contains the amount of reserved private anonymous memory. By definition, the private anonymous memory is the memory that is specific to the program and that holds its data. It can only be shared by forking in a copy-on-write fashion. It includes (but is not limited to) the stacks and the heap ((But we will see later that it only partially contains the data segment of the loaded executables)). This column does not contain any piece of information about how much memory is actually used by the program, it just tells us that the program reserved some amount of memory, however that memory may be left untouched for a long time.

    1. DATA 包括 the stacks and the heap,并且不止包括它们.
    2. DATA 不能告诉我们程序实际使用多少内存,它只是告诉我们该程序“保留”了一定数量的内存,但是该内存可能会长时间保持不变。

    $$ANON = RES – SHR$$ ( ANON 表示在堆上分配的内存.)

    $$ANON <= DATA$$ (vm_physic)

    SHR

    21. SHR  --  Shared Memory Size (KiB)
            The amount of shared memory available to a task, not all of which is typically resident.  It simply reflects memory that could be potentially shared with other  processes.
    
            任务可用的共享内存量,但并非所有的共享内存都是常驻(resident)的。它(SHR)只是反映了可能与其他进程共享的内存
    

    SHR contains all virtual memory that could be shared with other processes, and RSS contains all memory physically in RAM that is used by the process.

    Thus all shared memory currently in RAM is counted both in SHR and in RSS, so SHR + RSS has no meaning since it can contain duplicates counts.(SHR + RSS没有意义,因为他们可能包含重复的项)

    1. 除了自身进程的共享内存,也包括其他进程的共享内存
    2. 虽然进程只使用了几个共享库的函数,但它包含了整个共享库的大小
    3. 计算某个进程所占的物理内存大小公式:RES – SHR
    4. swap out后,它将会降下来

    通过 proc filesystem

    cat /proc/1210/statm
    289626 32701 1934 5618 0 263204 0

    //os 内存页大小

    getconf PAGESIZE
    4096

    Table 1-3: Contents of the statm files (as of 2.6.8-rc3)

    Field Content 与 top 相关字段
    size total program size (pages) (same as VmSize in status) $$VIRT=289626*4096/1024=1158504$$
    resident size of memory portions (pages) (same as VmRSS in status) $$RES=32701*4096/1024 = 130804$$
    shared number of pages that are shared (i.e. backed by a file, same as RssFile+RssShmem in status) $$SHR=1934*4096/1024=7736$$
    trs number of pages that are ‘code’ (not including libs; broken, includes data segment) $$CODE=5618*4096/1024=22472$$
    lrs number of pages of library (always 0 on 2.6)
    drs number of pages of data/stack (including libs; broken, includes library text) $$DATA=263204*4096/1024=1052816$$
    dt number of dirty pages (always 0 on 2.6)

    通过 pmap

    ➜  ~ pmap -X 1210|head -n 5
    1210:   /usr/sbin/mariadbd
             Address Perm   Offset Device Inode    Size    Rss    Pss Referenced Anonymous Swap Locked Mapping
        556335aee000 r-xp 00000000  fd:01 22182   22472   5228   5228       5172         0    0      0 mariadbd
        5563372df000 r--p 015f1000  fd:01 22182    1392   1392   1392       1392      1392    0      0 mariadbd
        55633743b000 rw-p 0174d000  fd:01 22182     720    416    416        416       384    0      0 mariadbd
    ➜  ~ pmap -X 1210|tail -n 5
        7ffc7142e000 rw-p 00000000  00:00     0     132     76     76         76        76    0      0 [stack]
        7ffc714af000 r-xp 00000000  00:00     0       8      4      0          4         0    0      0 [vdso]
    ffffffffff600000 r-xp 00000000  00:00     0       4      0      0          0         0    0      0 [vsyscall]
                                                ======= ====== ====== ========== ========= ==== ======
                                                1158508 131308 128832     131136    123272    0      0 KB
    

    In computing, proportional set size (PSS) is the portion of main memory (RAM) occupied by a process and is composed by the private memory of that process plus the proportion of shared memory with one or more other processes(由该进程的私有内存加上与一个或多个其他进程的共享内存的比例组成). Unshared memory including the proportion of shared memory is reported as the PSS.

    jvm 与 linux

    jvm 设置xmsxmx之后,jvm进程占的实际内存,为什么还会变动

    G1 will try expand the heap if the amount of time you spend doing GC work versus application work is greater than a specific threshold. Note: If your min/max heap are the same, expansion cannot occur.

    其实堆的大小已经是固定了, jvm dump不会再扩展.

    Linux给各个进程提供相同的虚拟内存空间;这使得进程之间相互独立,互不干扰。实现的方法是采用虚拟内存技术:给每一个进程一定虚拟内存空间,而只有当虚拟内存实 际被使用时,才分配物理内存。

    -Xms10g -Xmx10g, when jvm start, it will ask op-system allocation 10g memory which will be used for heap.

    And op-system will try to allocate the memory for the JVM (show as VIRT), but system did not promise u it will allocate physical memory, it maybe swap 😉

    But u will find the VIRT is still not 10g, that reason is 10g is for heap size, a JVM include much more the heap, for example, stack, permgen(hotspot JDK8, openJDK seems has no permgen, fix me if i am wrong), native stack, code, files etc.

    jvm heap usage used 的大小比top中的RES还大

    [root@node2 octopus]#  /usr/lib/jvm/java-11/bin/jhsdb jmap --heap --pid 31821
    Attaching to process ID 31821, please wait...
    Debugger attached successfully.
    Server compiler detected.
    JVM version is 11.0.11+9-LTS
    
    using thread-local object allocation.
    Garbage-First (G1) GC with 13 thread(s)
    
    Heap Configuration:
       MinHeapFreeRatio         = 40
       MaxHeapFreeRatio         = 70
       MaxHeapSize              = 19327352832 (18432.0MB)
       NewSize                  = 1363144 (1.2999954223632812MB)
       MaxNewSize               = 11593056256 (11056.0MB)
       OldSize                  = 5452592 (5.1999969482421875MB)
       NewRatio                 = 2
       SurvivorRatio            = 8
       MetaspaceSize            = 21807104 (20.796875MB)
       CompressedClassSpaceSize = 1073741824 (1024.0MB)
       MaxMetaspaceSize         = 17592186044415 MB
       G1HeapRegionSize         = 16777216 (16.0MB)
    
    Heap Usage:
    G1 Heap:
       regions  = 1152
       capacity = 19327352832 (18432.0MB)
       used     = 17792765976 (16968.503929138184MB)  #这里
       free     = 1534586856 (1463.4960708618164MB)
       92.06002565721671% used
    G1 Young Generation:
    Eden Space:
       regions  = 11
       capacity = 872415232 (832.0MB)
       used     = 184549376 (176.0MB)
       free     = 687865856 (656.0MB)
       21.153846153846153% used
    Survivor Space:
       regions  = 8
       capacity = 134217728 (128.0MB)
       used     = 134217728 (128.0MB)
       free     = 0 (0.0MB)
       100.0% used
    G1 Old Generation:
       regions  = 1064
       capacity = 18320719872 (17472.0MB)
       used     = 17490776088 (16680.503929138184MB)
       free     = 829943784 (791.4960708618164MB)
       95.46991717684399% used
    
    [root@node2 octopus]#
    [root@node2 octopus]#
    [root@node2 octopus]# top -p 31821
    top - 16:54:33 up 21:52,  2 users,  load average: 0.00, 0.01, 0.05
    Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem : 49177608 total,   391092 free, 22373724 used, 26412792 buff/cache
    KiB Swap: 33554428 total, 33554428 free,        0 used. 26303500 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
      31821 root      20   0  159.4g  15.0g  25284 S   0.0 32.0   5:40.87 jsvc
    
    

    jvm 堆使用了used = 17792765976 (16968.503929138184MB) 进程top中RES确只还是15.0g (RES - SHR). 乍一看有点奇怪,但是统计一下jvm 堆里的对象,实际只占了11GB左右.

    [root@node2 octopus]#  /usr/lib/jvm/java-11/bin/jmap -histo 31821 |head -n 5
     num     #instances         #bytes  class name (module)
    -------------------------------------------------------
       1:          3094    10086075648  [J (java.base@11.0.11)
       2:         90396     1081725104  [B (java.base@11.0.11)
       3:         11750      203173760  [I (java.base@11.0.11)
    [root@node2 octopus]#  /usr/lib/jvm/java-11/bin/jmap -histo 31821 |tail -n 5
    1661:             1             16  sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter (java.base@11.0.11)
    1662:             1             16  sun.util.logging.internal.LoggingProviderImpl (java.logging@11.0.11)
    1663:             1             16  sun.util.resources.LocaleData$LocaleDataStrategy (java.base@11.0.11)
    1664:             1             16  sun.util.resources.cldr.provider.CLDRLocaleDataMetaInfo (jdk.localedata@11.0.11)
    Total        789125    11397387608 #这里
    

    这说明了jvm dump真实占用除了存活对象之后,还有其他部分. 是不是存储对象所使用的所有region 数量的总和呢?

    top VIRT and RSS

    When is Virtual Memory Size Important?

    The virtual memory map contains a lot of stuff.

    • Some of it is read-only,

    • some of it is shared,

    • and some of it is allocated but never touched (eg, almost all of the 4Gb of heap in this example).

    But the operating system is smart enough to only load what it needs, so the virtual memory size is largely irrelevant.(操作系统只给进程分配它们真实需要使用的内存,因此虚拟内存基本不需要注意)

    Where virtual memory size is important is if you’re running on a 32-bit operating system, where you can only allocate 2Gb (or, in some cases, 3Gb) of process address space. In that case you’re dealing with a scarce resource, and might have to make tradeoffs, such as reducing your heap size in order to memory-map a large file or create lots of threads.(以前的机器都是32位的 逻辑寻址最多访问 4GB 内存, 去掉系统保留的,大部分机器上进程只能访问3GB. )

    But, given that 64-bit machines are ubiquitous, I don’t think it will be long before Virtual Memory Size is a completely irrelevant statistic.

    When is Resident Set Size Important?

    Resident Set size is that portion of the virtual memory space that is actually in RAM. If your RSS grows to be a significant portion of your total physical memory, it might be time to start worrying. If your RSS grows to take up all your physical memory, and your system starts swapping, it’s well past time to start worrying.

    But RSS is also misleading, especially on a lightly loaded machine. The operating system doesn’t expend a lot of effort to reclaiming the pages used by a process. There’s little benefit to be gained by doing so, and the potential for an expensive page fault if the process touches the page in the future. As a result, the RSS statistic may include lots of pages that aren’t in active use. (在轻负载的机器上,操作系统可能不会很及时的回收失效页.因此RSS可能包含很多失效的page)

    Memory – Part 2: Understanding Process memory

    The /proc Filesystem

    Proportional set size

    Linux和JVM内存

  • GLIBCXX_USE_CXX11_ABI

    代码在centos7环境上编译可以跑.但是其他同事在测试.刚好手头上有一台rehl8的环境是空闲的.

    在rehl8上编译是成功的.但是程序运行就会崩溃.

    日志里面可以看到

    fusionsphere/so/libfc.so: undefined symbol: _ZN16CFusionSphereSDK10InitialSDKEPFviPKczEP23CFusionSphereDebugLevelRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESD_SD_
    

    可是程序明明是通过编译没有出错,甚至没有泛型这样的rtti. 明显也不是访问错误内存的段错误.

    (more…)

  • mysql调优带来的启示

    调优要抓住关键点.缓存也有缓存的弊端.如何让缓存价值更高.硬件层面解决问题.

    进行 mysql 管理优化,三大原则

    1. 访问内存数据的速度比访问磁盘数据速度快
    2. 将数据保存在内存里,尽可能减少磁盘活动
    3. 保留索引的信息比保留行的内容重要

    调优的基本方式

    1. 一次只改一个参数.如果同时修改多个变量,则很难对每个更改所产生的影响进行评估
    2. 逐步增大系统变量值.
      1. 避免一下子耗光系统资源
      2. 避免系统因为你将变量设置过高而变得异常快或者异常慢
    3. 不要在生成环境做测试
    4. 禁用不需要的存储引擎.减少内存使用.
    5. 保持简单的访问权限.在 MySQL 数据库中,权限表除了 user 表外,还有 db 表、tables_priv 表(对单个表进行权限设置)、columns_priv 表(对单个数据列进行权限设置)和 procs_priv 表(对存储过程和存储函数进行权限设置).当配置它们时,服务器在检查 sql 语句权限时,一定会检查它们的内容.

    一般调整的通用型系统变量

    1. back_log

      在处理当前连接时,排队等待连接的最大请求数.

    2. max_connections

      服务器支持的最大客户端并发连接数.可以使用show status查看变量Max_used_connections

    3. table_open_cache

      当服务器打开文件时,会试图将它们保持在打开状态,以减少必须要完成的文件打开操作和文件关闭操作的数量.可以通过show global status like 'Opened_tables'进行评估

      1. 如果Opened_tables迅速增大,则意味缓存太小
      2. 大一点的缓存可以减少缓存失效
    4. table_definition_cache

      与table_open_cache正相关,用于控制存储表定义的缓存大小.

    5. open_files_limit

      1. 考虑操作系统对进程所用使用的文件句柄限制
      2. mysql 本身也有自己的open_files_limit限制
    6. max_allowed_packet

      客户端通信的缓存区的最大值.默认 1MB ,允许的最大值为 1GB.可能还需要相应的增大 下面两个变量

    • read_buffer_size 读取操作使用的缓冲区大小
    • sort_buffer_size 排序操作使用的缓冲区大小

      每一个 session 都会受其影响,应该要逐步调整

    innodb 缓冲池

    基本组成

    1. 是个列表,并且分新旧两个子列表
    2. 改进的 LRU 策略,插入数据插入到旧的子列表前
      1. 如果数据块是查询需要的,会被访问后移到新子列表前面
      2. 如果是由于预读需要,可能不会被访问,减少了预读对缓存的影响

    一般调节的参数

    • innodb_buffer_pool_size

      缓冲池大小.单位为字节

    • innodb_buffer_pool_instance

      如果innodb_buffer_pool_size>=1GB && innodb_buffer_pool_instance>1,Innodb 会把缓冲池处理成多个小的缓冲池实例.通过随机分配的方式,减少并发竞争.

    • 影响缓冲池缓存失效的参数

      • innodb_old_blocks_pct 缓冲池的旧子列表所占的百分比.默认是 37.
      • innodb_old_blocks_time 一个缓存块在第一次访问之后,下次访问之前,需要在旧子列表待多少毫秒才移动到新子列表.(将其设置为大于 0,可以防止类似表扫描这种一次性访问大量数据的行为对缓冲池产生太大影响)

    考虑开启 mysql 查询缓存

    • select 语句执行后,服务器会记住这条查询语句的文本和它返回的结果
    • 查询缓存通过比较查询语句的文本进行比对,是否命中缓存
    • 查询返回的结果不确定则不会存储.例如使用时间 NOW()函数.
    • 表被修改,指向它的所有缓存查询都失效

    弊端

    • 为查询缓存分配过大的内存,会导致比较当前查询语句是否命中缓存花费过多时间.
    • 多客户端连接情况下,带来了并发的缓存竞争.(因为查询缓存只能单线程操作)

    硬件

    • 更多内存
    • 更强,更多处理器
    • 使用更快的硬盘
      • 全用 ssd
      • 利用机械盘组成 raid
      • 将磁盘活动分布到不同的物理设备上,充分利用并行特性