ä»åã¯2æšæ¬ã®tæ€å®ãåãäžããŸãã
ããŒã¿ã¯Sample SuperStoreã䜿çšããŸãã
ãŸãä»å䜿çšããWorkbookã¯ä»¥äžããããŠã³ããŒãã§ããŸãã
åè
Introductory Statistics with R (å€ãæ¬ã§ããçµ±èšåŠã®Rå®è£ ãåŠã¶ã«ã¯ããæ¬ã§ããããã®ãã¡ããå°ãæè¿ã®æ¬ãåç §ããããã«ããŸãïŒ
2æšæ¬ã®tæ€å®ãšã¯ïŒè©³çŽ°ã¯å²æïŒ
詳ãã話ã¯åèè³æãèŠãŠé ããšããŠããã£ããèšãã°ã2ã€ã®ããŒã¿ã®å¹³åå€ã®éããçµ±èšçã«ææãã©ãã調ã¹ããããã®ææ³ãšç解ããŠããŸãã
äŸãã°ä»åã¯SuperStoreã®ãWestãšEastã®Order IDããšã®åèšSalesã®å¹³åå€ããæ¯èŒããŸãããã®çµæã¯ä»¥äžã®ããã«ãçŽ$34ã®å·®ããããŸããã
ãã®$34ã¯çµ±èšçã«ææãªå·®ãšèšããã®ãã©ããïŒå³å¯ã«ã¯äž¡ããŒã¿ã®å¹³åå€ã®å·®ããŒãã§ãããšãã仮説ãæ£åŽããããã©ããïŒãç¥ãããã«2æšæ¬tæ€å®ã䜿çšããŸãã
å®éã«ã¯ããŒã¿ãæ£èŠååžãã©ããã2ã€ã®ããŒã¿ã®åæ£ãçããããæ°ã«ãããã¹ããªã®ã§ããããŸãã¯å®è£ ãèããŸãããã
tæ€å®ã®å®è£
以éã2æšæ¬tæ€å®ãåã«tæ€å®ãšåŒã³ãŸãã
ãŸãã¯ä»åæ±ãããŒã¿ãèŠãŠã¿ãŸãã
ä»åã®å ¥åã«ã¯ãOrderIDããšã®åèšå£²äžãã䜿çšãããã®ã§ãOrder IDãViz-LODã«å«ããŸãã
ãã®ãããã®å 容ã«ã€ããŠã¯ãã¡ãã®èšäºãåç §ãã ããã
次ã«ãèšç®ãã£ãŒã«ãã§tæ€å®ã®çµæãè¿ãå®è£ ãèããã®ã§ããããŸãã¯Räžã§tæ€å®ã®å®è¡çµæãã©ã®ãããªåœ¢ã§åŸãããã®ãã確èªããŸãã
ïŒçåæ£ãã©ããã®ç¢ºèªã¯åŸçšè¡ããšããŠããŸãã¯çåæ£ãä»®å®ããå Žåã®tæ€å®ã®çµæãèŠãŠã¿ãŸãïŒ
äžèšã¯R Studioäžã§west: RegionãWestã®OrderIDããšã®åèšSalesã®ããŒã¿ãeast: RegionãEastã®OrderIDããšã®åèšSalesã®ããŒã¿ãçåæ£ãä»®å®ããtæ€å®ã«ãããçµæã§ãã
Tableauã®å®è£ äžã§å€§äºãªããšã¯2æç®ã§ãtæ€å®ã¯åºåããªã¹ãåã§åºããŸãã
ä»åTableauã«ã¯på€ïŒããŒã¿ã®å¹³åå€ã®å·®ããŒãã§ãã確çïŒã欲ããã®ã§ãäžèšãèŠããšpå€ã¯ä»¥äžã®ãããªå®è£ ããåŸãããããšãåãããŸãã
p_value = t.test(west$Sales,east$Sales,var.equal=T)$p.value
æ¹éã¯åãã£ãã®ã§ãTableauäžã§å ¥åãã¯ãã«ãåèšç®ãã£ãŒã«ãã§æãããã°è¯ããããªã®ã§ãTableauã§ã®å®è£ ãèããŠãããŸãã
ãŸããå ¥åãã¯ãã«ã®ãµã€ãºã確ãããŸãã
ä»åã¯åRegionã®Order IDããšã®åèšSalesã«ã€ããŠæ±ãããã®ã§ãCOUNTD(Order ID)ããã®ãŸãŸæ³å®ããããã¯ãã«ãµã€ãºïŒtæ€å®ã«äœ¿ããµã³ãã«æ°ïŒã«ãªããŸãã
ãšããããšã§ã以äžã®ãããªèšç®åŒãšãã¥ãŒãäœæãããŸãå ¥åãã¯ãã«ã®ãµã€ãºãèŠãŠã¿ãŸãããã
West SalesïŒEast Salesãåæ§)
IF [Region] = 'West'
THEN [Sales]
END
vector size (west)ïŒEastãåæ§)
SCRIPT_REAL(
'length(.arg1)'
, SUM([West Sales])
)
ã©ãã«ã«ã¯å¯Ÿå¿ããvector sizeã䜿çšããŠããŸãã
ãŸãè¡šèšç®ã¯Order IDã䜿çšããŠèšç®ãããŸãã
ããã§VizLODã«Regionã䜿çšããŠããªãããšã«æ³šæããŠãã ããã
äºã€ã®ããŒã¿ã¯Measure Namesã§åããŠããŸãïŒããã¯ããŒã¿ãLODã®æå³ã§åããªãã®ã§ããã®ãã¥ãŒã®Viz-LODã¯Order IDã®ã¿ã§ãïŒã
ããŠãæ¬æ¥East/Westã«å¯ŸããŠ470/538ãæ³å®ãããŠããã®ã«å¯ŸããŠãå ¥åãã¯ãã«ã¯äºã€ã®å€ã®åèšå€ãè¿ããŠããŸãããããã¯äœæ ã§ããããïŒ
çãã¯ãCOUNTDãèšç®ããè¡šãšéããäžã®ãã¥ãŒã§ã¯RegionãViz-LODã«ãªããããã§ããã€ãŸãSCRIPT_REALã§Rã«å ¥åããããã¯ãã«ã¯ãèšç®åŒäžã§Regionããšã«åããããšããŠããå®éã«ã¯ããŒã¿ãRegionã§åããããŠããªããããäž¡Regionã«å«ãŸããOrder IDå šãŠãå ¥åãããã®ã§ãäžèšã®ããã«ãã¯ãã«ãµã€ãºã1008ãšãªããŸãã
tæ€å®ã®åŒïŒãšããããtå€ã®åŒïŒã«ã¯2ããŒã¿ã®ãµã³ãã«ãµã€ãºã䜿çšãããããå ¥åãã¯ãã«ã®ãµã€ãºã¯æ£ãããªããã°ãããŸããã
ãããã£ãŠãvector sizeã®åŒãä¿®æ£ããå¿ èŠããããŸãã
vector size (west)ïŒä¿®æ£çïŒ
SCRIPT_REAL(
'length(subset(.arg1,!is.na(.arg1)))'
, SUM([West Sales])
)
ä¿®æ£çã§ã¯ãåå ¥åãããNA: æ¬ æå€ãæãããã¯ãã«ã䜿çšãããããã®èšè¿°ã足ããŠããŸãã
ïŒãšããã§TableauããRã«å ¥ãéã®æ¬ æå€ã¯NULLã§ãªãNAãªã®ã§ããïŒ
å®éã«ä¿®æ£çã®çµæãèŠãŠã¿ããšã以äžã®ããã«ãªããŸãã
ãããã«å ¥åãã¯ãã«ã®ãµã€ãºãæåŸ ããŠãããã®ãšäžèŽããŸããã
æ£ããå ¥åãã¯ãã«ãåŸãããã®ã§ãå®éã«tæ€å®ã§ã®på€ãåºããŠã¿ãŸãã
p-value of t_test
SCRIPT_REAL(
't.test(
subset(.arg1,!is.na(.arg1))
, subset(.arg2,!is.na(.arg2))
, var.equal=T)$p.value'
, SUM([East Sales]), SUM([West Sales])
)
R Studioã§ã®å®è¡çµæãšäžèŽããŠããã®ã§ãæ£ããçµæãåŸãããŠããããšãåãããŸãã
äœè«ïŒå®ã¯äœ¿ãé¢æ°ã«ãã£ãŠã¯èªåã§NAãçããŠããã
subset()ã䜿çšããŠæ¬ æå€ãçããŠæ£ãããã¯ãã«ãå ¥åããŸãããããšããè°è«ãäžã§ããŸããããå®ã¯ä»åã®ã±ãŒã¹ã§ã¯æ¬ æå€åŠçãããªããŠãæ£ããpå€ãè¿ã£ãŠããŸãã
ãšããã®ããt.test()ã«ã¯na.actionãšãããã©ã¡ãŒã¿ãããããã¡ãæ¬ æå€ãã©ãåŠçããããæå®ãããã©ã¡ãŒã¿ã§ãã
ããã©ã«ãã§ã¯æ¬ æå€ãçããŠæ€å®ããããã«ãªã£ãŠããã®ã§ãããããããå ¥åãããçšåºŠã³ã³ãããŒã«ããæ¹ãè¯ãã®ã§ã¯ããšæãäžèšã®è°è«ãæžããŸããã
ããŠãäžèšãŸã§ã§ã2019幎床ã®Order IDããšã®Salesã®å¹³åå€ã¯ãEastãšWestã§ææãªå·®ããããããTableauäžã§ç¢ºãããããšãã§ããããã«ãªããŸããã
ãããŸã§ã®å 容ãå ã«ããŠãæ次ã§ææãªå·®ããã£ããã確ããããããã¥ãŒãäœæã§ããŸãã
è¡šèšç®ã®èšå®å«ãã詳现ã¯WorkbookãããŠã³ããŒãé ããã°ãšæããŸãã
çåæ£æ§ã®æ€å®
ãšããã§ããããŸã§æ¬åœã«WestãšEastã§Order IDããšã®Salesã«ãããåæ£ãçããã®ãã確ãããŠããŸããã§ããã
çåæ£æ§ã確ãããã«ã¯Fæ€å®ã䜿çšããŸãã
Rã«ã¯ç°¡åã«çåæ£æ§ã確ãããé¢æ°var.test()ãããã®ã§ãå®éã«äœ¿çšããŠã¿ãŸãããã
p-value of F_test
SCRIPT_REAL(
'var.test(
subset(.arg1,!is.na(.arg1))
, subset(.arg2,!is.na(.arg2))
, var.equal=T)$p.value'
, SUM([East Sales]), SUM([West Sales])
)
p_value < 0.05 (F_test)
//H1: true difference in means is not equal to 0
IF [p-value of F_test] < 0.05
THEN 'Diff. Var.'
ELSE 'Same Var.'
END
èŠãŠã¿ããšãçµæ§çåæ£æ§ã¯æã«ãã£ãŠå€ããããã§ãã
ãããã£ãŠçåæ£æ§ã«åãããŠtæ€å®ã®äžèº«ãå€ããå®è£ ãå¿ èŠããã§ãã
ãšããã§ãtæ€å®ã®çåæ£æ§ä»®å®ã¯t.test()é¢æ°ã®var.equalãã©ã¡ãŒã¿ã§å¶åŸ¡ããŸãã
ãããã£ãŠFæ€å®ã®çµæãT/Fã§æž¡ãèšç®ãã£ãŒã«ããçšæããŠãããã°è¯ããšããããšã«ãªããŸãã
F-test Result
//H1: true difference in means is not equal to 0
IF [p-value of F_test] < 0.05
THEN 'F' //H0 is rejected
ELSE 'T' //H0 is not rejected
END
p-value of t_testïŒä¿®æ£çïŒ
SCRIPT_REAL(
't.test(
subset(.arg1,!is.na(.arg1))
, subset(.arg2,!is.na(.arg2))
, var.equal=' + [F-test Result] +')$p.value'
, SUM([East Sales]), SUM([West Sales])
)
ãšããã§çµè«ããèšãã°ãçåæ£æ§ã«ãããããtæ€å®ã®çµæã¯å€ãããŸããã§ããã
äžèšãã¥ãŒã§ã¯EastãšWestã®Order IDã®åå¥ã«ãŠã³ããè¿œå è¡šèšããŠããã®ã§ãããèŠãéã
å¹³åå€ã®å·®ã倧ããåºãŠããŠãOrderæ°ïŒãµã³ãã«æ°ïŒãå°ãªãã®ã§ãçµæçã«tæ€å®ã§ææãªå·®ãåºã«ããïŒãã€æ£èŠååžã®éçšããã«ããïŒæ°ã«ãªãæ¹ã¯tæ€å®ã®ããã¹ãæ§ãšäžå¿æ¥µéå®çãã調ã¹äžããïŒã
ãµã³ãã«æ°ã倧ãããŠããããããã®å¹³åå€å·®ãå°ããã®ã§å·®ãææã§ãªããšå€æãããã
äžèš2ç¹ãåºãŠããã®ããªãšãçµè«ãšããŠã¯ãæ¬è³ªçã«ã¯EastãšWestã§ã¯Orderããšã®å¹³åSalesã«ææãªå·®ã¯åºæ¬çã«ã¯ç¡ãããã§ãã
ãããRegionå šéšã®å·®ã®æææ§ãèŠãããšãã¯ã©ããããã ããšãã話ã¯å¥ã®æ¥ã«ã
æ°ã«ãªãæ¹ã¯åæ£åæïŒANOVAïŒã§ã調ã¹äžãããè¿ã ããã°ãæžããŸãã
2020/7/1è¿œèšïŒç°åžžå€ã®åŠçã«ã€ããŠ
ä»åã®æ¬é¡ã§ãã£ãtæ€å®ã®å®è£ èªäœã¯äžèšã§å®çµããŠããã®ã§ãããäžèšã®çµæã®åŠ¥åœæ§ã«ã€ããŠã¡ãã£ãšç¢ºèªãããããšãåºæ¥ãã®ã§ãããã«è¿œèšãšããŠæžããŠãããŸãã
ïŒæžããªããå®éšããŠããŸãïŒ
ãŸããå·®ããããªãã«åºã€ã€ã2ããŒã¿éã®å¹³åå€ã®å·®ã¯ææã§ãªãããšããçµæãåºãŠããããã§ãããä»®ã«tæ€å®ãå®æœããéã®ãµã³ãã«æ°ãå¢ããããã©ããªãã®ã§ããããïŒã€ãŸããµã³ãã«æ°ãå¢ããã°ãããŒã¿ã®åæ£ã¯å¯Ÿå¿ããŠå°ãããªã£ãŠãããã®ã§ããããïŒ
以äžã«ååæããšã«ããŒã¿ã®éèšãå€ãããã®ãèšèŒããŸãã
ãµã³ãã«æ°ã¯ååããã§ããããã¯ãå¹³åå€ã®å·®ã«çµ±èšçææãã¯åºãŸããã
å°ãåç¹ã«ç«ã¡è¿ããå®éã®ããŒã¿ã®ååžãã¿ãŠã¿ãŸãïŒäŸãšããŠ2018幎ã衚瀺ïŒã
Box Plotãšå¹³åå€ã衚瀺ããŠããŸãã
ååžã䌌éã£ãŠããã®ã§ç¢ºãã«ææãã¯ç¡ãããã§ããããã以äžã«ååžãåºãã£ãŠããããšãæ°ã«ãªããŸããããã¯äŸãã°é¡§å®¢Segmentãªã©ã«ãã圱é¿ã§ããããïŒå°ãèŠãŠã¿ãŸãããã
ïŒã¡ãªã¿ã«ä»åã¯tæ€å®ã®å®è£ ãšããæèããå ¥ã£ãã®ã§æåã«ããŒã¿ã確èªããŸããã§ããããããããããšãããéã¯æåã«ããŒã¿ã®ååžãèŠãªããšãããŸãããïŒ
ããŸãéããèŠããŸãããããã以äžã«ã以äžã«åèšSalesã倧ããããŒã¿ãã«åŒã£åŒµãããŠããããã«ãèŠããŸãã
ãã®ãããªååžããå€ããããŒã¿ãç°åžžå€ãªã©ãšåŒã¶ã®ã§ããããµã³ãã«æ°ã決ããŠå€ãã¯ãªãããŒã¿ã«ãããå¹³åå€ïŒãšåæ£ïŒã¯ç°åžžå€ã®åœ±é¿ãåããããããã¡ããtæ€å®ã®çµæã«ã圱é¿ããŸãã
ç°åžžå€åŠçã®æ¹æ³ã¯ããã€ãããã®ã§ãããä»åã¯Tableauã§å®è£ ããããååäœïŒQuartileã䜿çšããŸãã
ã€ãŸãBox Plotã®äžç«¯ãšäžç«¯ãè¶ ããããŒã¿ãçããããªå®è£ ãèããŸãã
Inter Quatile Range
WINDOW_PERCENTILE(SUM([Sales]),0.75)
- WINDOW_PERCENTILE(SUM([Sales]),0.25)
Outlier Label
SUM([Sales]) > WINDOW_PERCENTILE(SUM([Sales]),0.75) + 1.5 * [Inter Quatile Range]
OR
SUM([Sales]) < WINDOW_PERCENTILE(SUM([Sales]),0.25) - 1.5 * [Inter Quatile Range]
å³äžã®èµ€ãç¹ãçãããOrderããšã®ååžã¯ã©ããªãããã§ããããïŒèŠãŠã¿ãŸãããã
ããã€ãã®ååæã§ã¯ãããããããææå·®ãåºããã§ããã
ãã®ç°åžžå€åŠçãtæ€å®ã®åŒã®äžã«çµã¿èŸŒãæ¹æ³ã以éã§èããŠãããŸãã
ãŸãã¯ãã¯ãã«ãµã€ãºã確èªããå®è£ ãæ£ãããã確èªããŸãã
vector size (east) (outlier) (Westãåæ§)
SCRIPT_REAL(
'length(subset(.arg1,!is.na(.arg1)))'
, IF NOT [Outlier Label] THEN SUM([East Sales]) END
)
åã©ãã«ã®äžåŽãç°åžžå€ãå«ãããã¯ãã«ãµã€ãºãäžãç°åžžå€é€å€ãããã¯ãã«ãµã€ãºã§ããç°åžžå€ã®åæ°ã確ãããŸãããåã£ãŠããŸããã
Tableauã®æäœã®é åºã«ãããšè¡šèšç®ãã£ã«ã¿ãŒã¯è¡šèšç®ã®åŸã«èµ°ã£ãŠããŸãã®ã§ãè¡šèšç®ã®äžã«IFæã§ã®ãã£ã«ã¿ãªã³ã°ãå®è£ ãã圢ã«ãªããŸãã
ãã®IFæ蟌ã¿ã®èšç®åŒãå ¥åãã¯ãã«ã«ãªããsubset()ã®äžã§æ¬ æå€ã¯é€å€ãããã®ã§ãçµæçã«ç°åžžå€æãã®å ¥åãã¯ãã«ãã€ã³ãããã«ãªãããšããç®æ®µã§ãã
ãšããããšã§ããã£ãštæ€å®ã®åŒãæ¹è¯ã§ããŸãã
F-test Result (outlier)
//H1: true difference in means is not equal to 0
IF [p-value of F_test (outlier)] < 0.05
THEN 'F' //H0 is rejected
ELSE 'T' //H0 is not rejected
END
p-value of F_test (outlier)
SCRIPT_REAL(
'var.test(
subset(.arg1,!is.na(.arg1))
, subset(.arg2,!is.na(.arg2))
, var.equal=T)$p.value'
, IF NOT [Outlier Label] THEN SUM([East Sales]) END
, IF NOT [Outlier Label] THEN SUM([West Sales]) END
)
p-value of t_test (outlier)
SCRIPT_REAL(
't.test(
subset(.arg1,!is.na(.arg1))
, subset(.arg2,!is.na(.arg2))
, var.equal=' + [F-test Result (outlier)] +')$p.value'
, IF NOT [Outlier Label] THEN SUM([East Sales]) END
, IF NOT [Outlier Label] THEN SUM([West Sales]) END
)
p_value < 0.05 (t_test) (outlier)
// H1: true difference in means is not equal to 0
IF [p-value of t_test (outlier)] < 0.05
THEN 'â '
ELSE ''
END
ãã¥ãŒã®çŽ°ããäœãã¯WorkbookãèŠãŠé ããšããŠãçµæã以äžã«èšèŒããŸãã
ç¹ã«ååæããšã®ïŒSegmentã§åããªãïŒå¹³åSaleså·®ã¯çµ±èšçæææ§ãïŒã€ã®ååæã§åºãããã«ãªããŸããããã¡ããçµ±èšçæææ§ãåºãããšãç®çã§ã¯ç¡ãã®ã§ããã
æ¬ æå€ãé€å€ããåŠçã¯ãã¯ãå®è£ ããŠãããæ¹ãããã®ããªãšæãã€ã€ããã®ãããã§è¿œèšãçå°ãããããšã«ããŸãã
æåŸã«
ä»åã¯tæ€å®ã«ã€ããŠåãäžããŸããããããŠè¿œèšã§ã¯æ¬ æå€åŠçã«ãå°ã 觊ããŸããã
æ¬åœã¯æ£èŠååžãä»®å®ããªãå Žåã®å®è£ ã«ã€ããŠãèæžãããèããã®ã§ãããFæ€å®ãããã§ã¹ã¿ãããåããŸãããæ°ã«ãªãæ¹ã¯ã調ã¹äžããã
å®è£ ã¯ãããªã«é£ããã¯ãªããªããšæããŸãã
ã質åçã¯TwitterãLinkedinãžãããããé¡ãããŸããããã§ã¯ã